A TRACE-DRIVEN SIMULATOR FOR PALM OS DEVICES by Hyrum D. Carroll...

A TRACE-DRIVEN SIMULATOR FOR PALM OS DEVICES

by

Hyrum D. Carroll

A thesis submitted to the faculty of

Brigham Young University

in partial fulfillment of the requirements for the degree of

Master of Science

Department of Computer Science

Brigham Young University

September 2004

Copyright c! 2004 Hyrum D. Carroll

All Rights Reserved

BRIGHAM YOUNG UNIVERSITY

GRADUATE COMMITTEE APPROVAL

of a thesis submitted by

Hyrum D. Carroll

This thesis has been read by each member of the following graduate committee andby majority vote has been found to be satisfactory.

Date J. Kelly Flanagan, Chair

Date Charles D. Knutson

Date Bryan S. Morse

BRIGHAM YOUNG UNIVERSITY

As chair of the candidate’s graduate committee, I have read the thesis of Hyrum D.Carroll in its final form and have found that (1) its format, citations, and biblio-graphical style are consistent and acceptable and fulfill university and departmentstyle requirements; (2) its illustrative materials including figures, tables, and chartsare in place; and (3) the final manuscript is satisfactory to the graduate committeeand is ready for submission to the university library.

Date J. Kelly FlanaganChair, Graduate Committee

Accepted for the Department

David W. EmbleyGraduate Coordinator

Accepted for the College

G. Rex Bryce, Associate DeanCollege of Physical and Mathematical Sciences

ABSTRACT

A TRACE-DRIVEN SIMULATOR FOR PALM OS DEVICES

Hyrum D. Carroll

Department of Computer Science

Master of Science

Due to the high cost of producing hardware prototypes, software simulators are

typically used to determine the performance of proposed systems. To accurately

represent a system with a simulator, the simulator inputs need to be representative

of actual system usage. Trace-driven simulators that use logs of actual usage are

generally preferred by researchers and developers to other types of simulators to

determine expected performance.

In this thesis we explain the design and results of a trace-driven simulator for

Palm OS devices capable of starting in a specified state and replaying a log of inputs

originally generated on a handheld. We collect the user inputs with an acceptable

amount of overhead while a device is executing real applications in normal operating

environments. We based our simulator on the deterministic state machine model. The

model specifies that two equivalent systems that start in the same state and have the

same inputs applied, follow the same execution paths. By replaying the collected

inputs we are able to collect traces and performance statistics from the simulator

that are representative of actual usage with minimal perturbation.

Our simulator can be used to evaluate various hardware modifications to Palm OS

devices such as adding a cache. At the end of this thesis we present an in-depth case

study analyzing the expected memory performance from adding a cache to a Palm

m515 device.

ACKNOWLEDGMENTS

Many thanks are due to the individuals who contributed to this thesis in one form or

another. Satish Baniya helped in the development of the input collection methods.

Myles Watson lent his eyes to numerous problems and his mind to making the project

better. Kelly Flanagan has inspired me and kept me on track with his vision. My

wife, Melissa, has lent her daily support and encouragement throughout the entire

project. And ultimately God, for His inspiration and solutions.

TABLE OF CONTENTS

Section Page

1 Introduction 1

1.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Our Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 System Architecture 7

2.1 High-Level Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Initial State Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Activity Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.1 Handheld Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.2 Hacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.3 Evaluation of Activity Logs . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3.4 Summary of Activity Logs . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4 Playback of Activity Logs . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4.1 The Palm OS Emulator . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4.2 Modifications To POSE . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.4.3 Simulator Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.4.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 System Validation 25

3.1 Test Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 Test Workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 Activity Log Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.4 Final State Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4 Cache Case Study 31

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3 Cache Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5 Conclusion 41

5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

xiii

LIST OF TABLES

Table Page

4.1 Volunteer User Session Data. . . . . . . . . . . . . . . . . . . . . . . . . 34

xiv

LIST OF FIGURES

Figure Page

2.1 Application Of The Deterministic State Machine Model. . . . . . . . . . 8

2.2 How A Hack Works. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 The genericEventRecord Data Structure . . . . . . . . . . . . . . . 12

2.4 The penEventRecord Data Structure . . . . . . . . . . . . . . . . . . 13

2.5 The keyCurrentStateCallRecord Data Structure . . . . . . . . . 14

2.6 The randomCallRecord Data Structure . . . . . . . . . . . . . . . . 16

2.7 Average Overhead For The EvtEnqueueKey Hack. . . . . . . . . . . . . 17

2.8 Average Overhead For Each Hack. . . . . . . . . . . . . . . . . . . . . . 18

3.1 Palm’s Puzzle Game. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.1 CPU And Memory Performance. . . . . . . . . . . . . . . . . . . . . . . 32

4.2 A Simple Memory Hierarchy. . . . . . . . . . . . . . . . . . . . . . . . . 32

4.3 Components Of A Cache Address. . . . . . . . . . . . . . . . . . . . . . 33

4.4 Palm Memory Hierarchy. . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.5 Miss Rates For 56 Cache Configuration. . . . . . . . . . . . . . . . . . . 37

4.6 Average E!ective Memory Access Times. . . . . . . . . . . . . . . . . . 38

4.7 Miss Rates For A Desktop Address Trace. . . . . . . . . . . . . . . . . . 39

xv

Chapter 1

Introduction

Before a new computer system is implemented in hardware, developers often use soft-

ware simulators to predict the performance of the new design. A detailed system

simulator subjects a virtual system to representative workloads for accurate evalua-

tion. Often a trace, or a sequence of system events recorded from an actual system,

is used as input to the simulator. Simulators that use traces for input are known as

trace-driven simulators.

Developers and researchers have demonstrated the utility of trace-driven simu-

lation for desktop and server systems. They have used this technique to evaluate

memory hierarchies, processor performance, etc. [1, 2, 3, 4, 5]. Rather than work-

ing on solutions to the well-known problems of collecting traces for desktops and

servers [6, 7, 8, 9], we undertook to collect traces in an interactive and mobile envi-

ronment.

In a mobile environment, the user is free to move around while still operating the

handheld and will use the device di!erently than a stationary computer. We observed

that mobile devices are typically used for short periods of time (less then a minute)

followed by short or long periods of inactivity, with occasional periods of heavy usage.

1

Often users will turn on the device just to look up a phone number or to check their

calendar. This type of sporadic usage is not characteristic of desktop use.

Since Palm OS is used on more handheld devices than any other operating sys-

tem [10], we focused on collecting traces for Palm OS devices. These devices are

handheld computers running the Palm OS, which has a “preemptive multitasking

kernel that provides basic task management” [11, 12]. Tens of millions of handhelds

around the world run Palm OS, with over a million units shipped in the first quarter

of 2004 alone [10].

To collect these traces is a multi-step process. First, we instrumented a Palm

OS handheld to record external inputs, referred as an activity log. Next, we transfer

this log to a desktop computer. Using a simulator that we developed, we replay

the collected activity logs. We use the terms replay and playback interchangeably

to refer to the simulator’s reproduction of an execution of events from an activity

log. Furthermore, we instrumented the simulator to collect traces and performance

metrics during playback.

One of the key issues with a monitoring system is the amount of perturbation

introduced. An ideal monitoring procedure would not perturb the system at all.

Quantifying the perturbation helps to determine how close to ideal a given monitoring

system is. In this thesis we quantify the amount of perturbation with the amount

of overhead introduced, where overhead is the di!erence in execution time between

the instrumented system and the original system. If the monitoring procedure alters

the way in which a user operates a system, for example, by perceivably increasing

the amount of time required to process inputs, the overhead is unacceptable. On the

other hand, if a system can be monitored in such a way that the user can not perceive

the introduction of the monitoring system, the amount of the overhead is acceptable.

2

1.1 Previous Work

Gannamaraju and Chandra’s work with Palmist [13] provided a foundation for

our work of collecting user’s inputs. They collected system activity on mobile hand-

helds by recording the occurrence of 80% (707 of the 880) of the Palm OS 3.5 system

calls with hacks. (In the Palm OS community, hacks explicitly refer to system exten-

sions, and are described in detail in Section 2.3.2.) Due to their collection technique,

Gannamaraju and Chandra were unable to generate a complete sequence of system

activity. Since they recorded the occurrence of most of the system calls, their method

introduced a large amount of overhead. They reported that the time required for each

system call to execute increased by two or more orders of magnitude with Palmist

running compared to a system not running Palmist. This amount of overhead is un-

acceptable. Furthermore, their technique requires significant memory requirements.

For example, they generated 1.34 MB of records on the handheld to perform a set of

tasks that requires about one minute of execution without Palmist running. Given

that the maximum amount of memory found on Palm OS devices for which they

did their work is between 8 and 16 MB, only a few minutes of execution could be

traced. Using an eternal memory device to increase storage capacity would increase

the already high amount of overhead.

Rose and Flanagan’s work with CITCAT (Constructing Instruction Traces from

Cache-filtered Address Traces) [14] inspired the model for our simulator. The CIT-

CAT procedure defines the mechanisms used to collect and replay an initial memory

state image and important events. Rose and Flanagan implemented CITCAT and

produced complete instruction traces of desktop workloads. Flanagan defined com-

plete instruction traces to be “those traces containing all CPU generated references

including those produced by interrupts, system calls, exception handlers, other su-

pervisor activities, and user processes” [6]. To do this, they collected the “state of

3

the processor, caches, main memory, ... hardware interrupts, DMA, and other asyn-

chronous or peripheral events that influence the processor” [15] with the aid of a

hardware monitor such as a logic analyzer. They then initialized the simulator with

the collected initial machine state image, and replayed the asynchronous events. Dur-

ing playback, they collected performance data and instruction and address traces. In

their paper, Rose and Flanagan demonstrated that CITCAT can collect complete

traces of desktop workloads. However, they used a large and expensive logic analyzer

which is impractical for real-time mobile collection.

External hardware monitors, such as logic analyzers and oscilloscopes, signifi-

cantly alter the user’s behavior by restricting the mobility of the user. To use an

external monitor requires connecting probes to various pins to gather data. Hand-

helds monitored with such equipment have to be handled very carefully to keep the

probes attached. Furthermore, attaching a logic analyzer or oscilloscope to a hand-

held would greatly restrict the mobility of the device.

Several other studies besides Palmist have been performed on handheld devices,

but have focused on evaluating the energy consumption and e"ciency. Flinn and

Satyanarayanan [16] collected information about handheld devices using oscilloscopes.

Cycle-accurate simulators [17, 18, 19] have also been used to estimate energy consump-

tion of handheld devices. For a simulator to produce accurate estimations, they need

to be fed input representative of real usage. Cignetti, Komarov and Ellis’ work [20] is

unique in that they utilize a modified version of the Palm OS Emulator. They limited

their studies to energy consumption estimates.

1.2 Our Solution

Our approach unites Rose and Flanagan’s CITCAT [14] and Gannamaraju and

Chandra’s Palmist [13]. We borrow Rose and Flanagan’s fundamental model of repre-

senting a computer system as a deterministic state machine and apply it to handheld

4

devices. They showed that determining the execution path of a state machine, or a

computer system, requires the initial state and sequence of inputs. We collect the

initial state of a mobile devices by storing its memory contents on a desktop com-

puter. To collect the sequence of inputs we use a collection technique similar to that

used by Gannamaraju and Chandra, but with orders of magnitude less overhead. To

illustrate the usefulness of our simulator, we present a case study demonstrating the

expected performance of adding cache memory to a handheld device. This case study

would not have been possible with previous data collection techniques for handheld

devices.

1.3 Thesis Outline

The rest of this thesis proceeds as follows: Chapter 2 explains the architecture of

our system by first describing it from a high-level perspective, and then elaborating

on the di!erent components. The collection of the initial state, activity logs and

their playback will all be covered in this chapter. Chapter 3 illustrates two di!erent

methods employed to validate the system. In the first method, we compare the

activity log of the user’s session and that of the emulated session. For the second

method, we compare the final state of an emulated session with the final state of

the handheld. In Chapter 4 we demonstrate the usefulness of the simulator and

present the expected cache memory performance for a Palm m515. The final chapter

concludes the thesis and describes future work.

5

Chapter 2

System Architecture

This chapter explains how we collect external inputs from a Palm OS device and

use this data as input to our trace-driven simulator. We first explain our model, in

general terms, then we describe our implementation of it. The model is divided into

three components. First, the model requires the initial state of the system (comprised

of the memory, processor, etc.). The second component is the sequence of inputs to

the system. We refer to the log of user inputs, along with the corresponding timing

information, as an activity log. The final component is an emulator which starts in

the initial state and processes the inputs to the system. We then explain and evaluate

the implementation of each of these components.

2.1 High-Level Overview

Since a computer processor is deterministic, we can use the deterministic state

machine model to represent it, as did Rose and Flanagan [15, 14]. First, let ! be an

initial state and " be a sequence of external inputs. The model specifies that every

time a deterministic state machine starts in ! and " is applied, the machine always

follows the same sequence of states and ends in the same final state. Furthermore,

if machine A and machine B are equivalent deterministic state machines, and both

7

Figure 2.1: Application Of The Deterministic State Machine Model.

start in !, and " is applied to both machines, then both machines A and B will follow

the same path of execution and end in the same state.

The deterministic state machine model applies to situations in which the workload

activity performed on system Suser is replicated on system Semulated (see Figure 2.1).

First, let Suser be a system instrumented to record the initial state and all external

inputs. Also, let Semulated, be an equivalent system (e.g., an emulator or simulator) of

Suser that can replay the events from Suser in an environment in which measurements

can be made. At the end of a session (the period of time that inputs are collected) we

transfer the initial state and collected inputs from Suser to Semulated. Then Semulated is

started in the initial state of Suser for that session. Semulated then processes and replays

the collected activity log from Suser. Performance measurements and workload traces

can be generated internally or externally on Semulated. Since statistics and data are

gathered on Semulated while the activity log is replayed, they represent the original

execution by the user without system perturbation.

For our implementation of the model, Suser is a handheld computer operated

by a human. We instrumented the handheld to collect the initial state and user

inputs (i.e., pen movements and button presses). We then transferred the data for a

8

session to a desktop computer. Semulated is a Palm OS device emulator that accurately

models handheld computer devices. We execute the emulator on a desktop computer

and use the initial state and activity log from Suser for playback. During playback,

the emulator follows the same sequence of states as the handheld. By applying the

deterministic state machine model to handhelds, we can collect the initial state and

inputs of a device and replay them with an emulator on a desktop computer.

The following steps represent the chronological order of the methods we used to

collect the relevant information from a handheld and replay it on an emulator:

• Instrument a handheld to collect user inputs• Transfer the initial state of a handheld to the desktop• Start collecting inputs• Allow the user to operate the handheld normally• Transfer the activity log from the handheld to the desktop• Load the emulator with the collected initial state of the handheld• Collect processor information while replaying the activity log on the desktop

We explain each of these steps in the following sections.

2.2 Initial State Collection

Before we give a handheld to a volunteer user, we instrument it to collect user

inputs, capture its initial state, and transfer the state information to a desktop com-

puter. As described above, an initial state is necessary to recreate accurate playback

of a session. For an initial state to be accurate, it must be complete. For a handheld,

the initial state includes the contents of the flash memory, the static RAM and the

current state of the processor (current point of execution and registers).

We use ROMTransfer.prc, an application freely distributed by PalmSource [21],

to transfer a flash image from a Palm OS device to a desktop with a USB cable. To

get the necessary contents of the RAM, we set the backup bit for all the applications

and databases, and perform a HotSync [22]. HotSyncing synchronizes the programs

and databases on a handheld with a copy on the desktop with a USB cable.

9

We chose to start every session directly after a soft reset. A soft reset on Palm OS

devices “reinitialize[s] the globals and the dynamic memory heap” [23]. Choosing a

common execution point freed us from recording the registers values and consequently,

writing these values in the emulated registers at precisely the beginning of a session.

Instead, we perform a soft reset that initiates a deterministic sequence of actions that

populates the registers.

A complete initial state consists of the flash image, all programs and databases

from RAM, and the current point of execution. This information is transferred to the

desktop at the beginning of a session.

2.3 Activity Logs

The deterministic state machine model specifies that in addition to the initial

state, a complete sequence of external inputs, or activity log, is needed to accurately

replay a session. For an activity log to be complete for a handheld, all forms of inputs

must be considered. For this reason, this section begins by discussing inputs to

handhelds. Next, our chosen method of data collection, hacks, is explained, detailing

the five hacks that we wrote to collect user inputs. This section concludes with an

evaluation of activity logs.

2.3.1 Handheld Inputs

Handheld computers are naturally interactive and allow for di!erent forms of

input from users. To produce a complete activity log, all relevant input needs to be

recorded. A typical handheld allows for the following inputs:

• Digitizer (touch screen)

• Buttons

• Real Time Clock (RTC)

• Memory Card

• IrDA/Serial Port

• Reset Button

10

Figure 2.2: How A Hack Works.

Of these inputs, we collectively record stylus movements on the digitizer (touch

screen), button presses (up, down, power, HotSync cradle and four application but-

tons) and the real time clock (RTC). These three input types account for the majority

of input on handhelds. Also, the insertion, removal, and name of a memory card can

be detected with our technique. We have chosen not to use memory cards in this

study due to the extra complexity and requirements in storing activity logs and simu-

lation. Due to time constraints and an expected increase in overhead, IrDA and serial

port activity are not collected. For simplicity, the reset event was not implemented in

the simulator. Recording the event of a system turning o! and on again has inherit

problems that we have left for future work.

2.3.2 Hacks

In the Palm OS community, the word hacks has a specific meaning. A hack is a

section of code contained in a routine that is “called in addition to or in lieu of the

standard Palm OS routines” [24]. By default, at the time a system routine is called,

11

Figure 2.3: The Record Structure For Button, Pen Up And Notification Events.

the operating system looks up its address in the trap dispatch table, jumps to that

location and continues execution. After the system routine finishes, execution contin-

ues in the application that called the routine. Inserting the address of a hack into the

trap dispatch table forces the system to call the hack instead of the default system

routine (see Figure 2.2). Since the system passes the same parameters to the hack

that it would have if it were calling the default system routine, the hack must accept

the same arguments as the system routine for proper system execution. Typically,

a hack either starts or ends by calling the default routine, and then returning the

result of that call. Often a hack manager, such as HackMaster [25], TealMaster [26]

or X-Master [27] is employed to manage modifications to the trap dispatch table. A

hack manager also provides a user interface to activate, deactivate and arrange the

order of hacks.

Unlike Palmist, which uses a hack for every system call, our simulator only requires

one hack for every system routine that processes user input. We wrote five hacks to

patch the following system routines: EvtEnqueueKey, EvtEnqueuePenPoint,

KeyCurrentState, SysNotifyBroadcast and SysRandom. Each of these

hacks opens a common database, inserts a record with the current tick counter and

the real time clock values, the event type and any necessary data. It then closes the

common database. Each hack also makes a call to the original system routine. We

now explain each of these five hacks in greater detail.

12

Figure 2.4: The Record Structure For A Pen Event.

EvtEnqueueKey Hack

We wrote a hack for the EvtEnqueueKey system routine to record the time

at which a user pressed a button. After the system handles an interrupt generated

by a button press, the event is placed in a queue by the EvtEnqueueKey system

routine. The up, down, power, HotSync cradle and four application buttons are all

enqueued with this function. We replaced the address of the EvtEnqueueKey

system routine in the system trap table with the address of our hack. Our hack

stores a record on the handheld containing information about the button that was

pressed. Figure 2.3 describes the records structure used. timeStamp represents the

tick counter (with a resolution of 10 msec). RTCtime is a copy of the RTCTIME

Dragonball registers and contains the hours, minutes and seconds of the RTC (real

time clock). dayInHours defines the calendar date, by recording the number of

hours since January 1, 1904. eventType records the type of event for the record.

Before our hack finishes, it passes the parameters it received to the EvtEnqueueKey

system routine.

EvtEnqueuePenPoint Hack

We wrote another hack for the EvtEnqueuePenPoint system routine. This

hack records pen movements, including pen up events (generated when the user stops

pressing on the screen). PalmSource classifies the EvtEnqueuePenPoint routine

as an undocumented system use only function. The Quartus Forth Manual states that

13

Figure 2.5: The Record Structure For A Call To KeyCurrentState.

it takes a PointType pointer and has a return type of Err [28]. After some trial

and error verification of recording stylus movements, we determined that the Point-

Type pointer contains the digitizer coordinates of the stylus. Digitizer coordinates

vary from device to device in relationship to the objects displayed. Recalibrating

a device changes the relationship of the sampled location to the objects displayed.

Pixel coordinates, on the other hand, represent the location of the pixel on the screen.

They do not vary from device to device in relation to the objects on the screen. Our

hack calls the Palm OS system routine PenRawToScreen to convert digitizer co-

ordinates to pixel coordinates. This information, along with the timestamps, are

stored in a penEventRecord. Figure 2.4 illustrates the penEventRecord. The

variables with the same names as in a genericEventRecord have the same mean-

ings. Additionally, x and y are pixel coordinates. A pen up event does not contain

coordinate information. To conserve memory, we use a genericEventRecord (see

Figure 2.3) to record the time that this event occurred.

KeyCurrentState Hack

Although our hack for EvtEnqueueKey records the time at which a button

is initially pressed, another hack is needed to capture the duration a user pressed

the button. To accomplish this, we wrote a hack that records the value returned by

KeyCurrentState in a keyCurrentStateCallRecord. Figure 2.5 details the

structure of a keyCurrentStateCallRecord. In the record, the keyBitField

14

is the value returned from KeyCurrentState, representing the buttons that are

currently pressed. All other values in the records have the same representations as

before. The Palm OS system routine KeyCurrentState returns a bit pattern

representing the buttons that are currently being pressed. Each bit represents one

of the eight buttons on a Palm OS handheld (power, up and down, four application

buttons and the cradle or cable HotSync button).

SysNotifyBroadcast Hack

To capture all of the user inputs, we wrote a hack to record the relevant notifica-

tions sent by the SysNotifyBroadcast system routine. SysNotifyBroadcast,

as its name suggests, broadcasts a system notification to all applications registered

for that broadcast [23]. Notifications are used as an internal messaging protocol and

include notifications for events such as the insertion or removal of a volume (i.e.,

memory card), the deletion of a database, or the transition of the system to a sleep

state.

The SysNotifyBroadcast routine sends a sysNotifyLateWakeupEvent

notification when a user turns the system on. Similarly, it broadcasts a sysNoti-

fySleepNotifyEvent notification when the user turns the device o!. When a user

presses a button to turn a device on, the system executes a di!erent sequence path

than when the device is turned o!. This type of button press is not enqueued by the

EvtEnqueueKey routine. For this reason, our hack records the occurrence of a

sysNotifyLateWakeupEvent notification, storing this information in a gener-

icEventRecord (see Figure 2.3). We chose to record the event of a user turning

the device o! with our hack for EvtEnqueueKey instead of with our hack for

SysNotifyBroadcast to simplify the logic in the EvtEnqueueKey hack.

15

Figure 2.6: The Record Structure For A Call To SysRandom.

SysRandom Hack

Our final hack records every time the operating system re-seeds the random num-

ber generator. Although re-seeding is not a user input, we discovered that it is

necessary to record the new seed value, to replay some sessions that call SysRan-

dom, due to minuscule timing di!erences. The operating system seeds and re-seeds

the random number generator with calls to SysRandom.

SysRandom, is a Palm OS system routine that returns a random number between

zero and sysRandomMax [23]. If it is called with a non-zero value, the parameter

is interpreted as a new seed and it returns the first random number in that sequence.

Calling it with zero returns a random number using the existing seed.

As with all the other hacks, this one records the event’s timestamps and signifies

its type in a randomCallRecord. Figure 2.6 details the randomCallRecord

record structure, with the seed representing a non-zero seed value passed to Sys-

Random. Zero seed value calls to SysRandom do not need to be recorded since the

device will deterministically issue random numbers in a sequence based on the seed.

2.3.3 Evaluation of Activity Logs

The overhead introduced by collecting the activity logs, in terms of storage re-

quirements and execution time, is negligible. The individual records each consume

twelve (see Figures 2.3 and 2.4) or sixteen bytes (see Figures 2.5 and 2.6) on the

handheld.

16

Figure 2.7: Average Overhead For The EvtEnqueueKey Hack.

We found that volunteer users could operate a device normally for more than a

week without having the database reach the maximum number of records (65,536).

Gannamaraju and Chandra describe an implementation to overcome this barrier at

the expense of additional execution time [13]. If the database contains the maximum

number of the largest size records, it would require a total of 1536 KB of memory for

the records and the database header information.

We asked volunteer users to qualitatively indicate any general di!erences in exe-

cution times with hacks enabled and disabled. They reported no perceived di!erence

except for computationally intensive applications such as Gra"ti Anywhere [29].

We quantitatively measured the overhead of the EvtEnqueuePenPoint hack by

counting the number of pen events per second in the database with the stylus con-

tinuously pressed against the screen. The test was implemented on a Palm m515,

which samples pen movements 50 times a second. The database initially contained

no records. The device recorded an average of 50.0 pen events per second in the

database indicating no perceptible overhead for pen sampling.

We also designed a test that called a hack in a tight loop on a handheld to further

quantify the overhead. The test eliminated the call to the original system routine

17

Figure 2.8: Average Overhead For Each Hack.

to isolate the overhead associated with the hack. Figure 2.7 illustrates the average

execution time per call of the EvtEnqueueKey hack. This test revealed that the

overhead of each call is proportional to the size of the database. For example, the

average overhead for the EvtEnqueueKey hack when the database had zero to 10,000

records was 6.4 ms. However, when the database had 50,000 to 60,000 records, the

average overhead was 15.5 ms per call. Given that the position of the stylus is sampled

every 20 ms, the overhead introduced by a hack when the database is nearly full is

significant.

To keep the overhead at an acceptable level, we limited sessions to shorter time

periods (e.g., two to three days). Figure 2.8 depicts the average overhead of each

hack averaged over the first 30,000 iterations. Limiting sessions to two or three days

generally kept the number of records well below 30,000. The overhead varied only

slightly for each hack.

18

2.3.4 Summary of Activity Logs

We wrote five hacks to record activity logs containing user inputs on handheld

devices. These hacks record all pen and button activity produced by a user. Further-

more, our hacks have an acceptable amount of overhead.

2.4 Playback of Activity Logs

The initial state and an activity log provide very little information in and of them-

selves. Their purpose is to provide the necessary information to replay the workload

for the time period they were collected on a handheld. As stated in Section 2.1, an

equivalent system can be used to start in the same initial state and replay the same

inputs. This equivalent system can then be used to provide statistics and perfor-

mance information without disturbing the integrity of the original workload. Our

tool utilizes the Palm OS Emulator, POSE, as our equivalent system.

In this section, we first introduce POSE. We then explain the modifications we

made to it. Next, we provide details of how we setup our simulator for playback.

Finally, some limitations are described.

2.4.1 The Palm OS Emulator

The Palm OS Emulator, POSE [21], is a publicly available and widely used devel-

opment tool for handhelds running Palm OS [30]. POSE, maintained by PalmSource

Inc., is based on Greg Hewgill’s Copilot [31]. It runs on a desktop system, and

emulates Palm OS based systems (e.g., Handspring, Palm, and Symbol handhelds).

POSE does this by keeping track of the processor state and fetching instructions out

of a flash image and an allocated RAM. Stylus and button presses are simulated with

the mouse. The emulator generates an event record to process the input. This event

record is returned to the system when EvtGetEvent is called (typically in the

application’s event loop). Our simulator is a modified version of POSE 3.5 that is

instrumented to replay activity logs and collect performance data.

19

POSE is a suitable framework for playback for several reasons. First, POSE ac-

curately emulates the Palm OS and the supporting hardware. It fetches the same

instructions as a handheld and updates registers and memory banks just as a phys-

ical device does. Second, POSE is already a widely used and accepted tool in the

development community for Palm OS devices. Third, it is free and distributable,

licensed under the GNU General Public License [32]. Fourth, POSE also allows for

quicker development by building on an existing software base. Finally, the source

code for POSE is available for Windows, UNIX and Mac OS platforms.

The Palm OS Emulator o!ers many resources for profiling, debugging, and testing.

Two relevant functions to this project are Gremlins and Profiling. Gremlins provide

pseudo random events to flush out bugs in applications. A developer can setup

Gremlins to run for a specified number of events or indefinitely. When an error occurs,

Gremlins stops and the same sequence of events can be run again. Profiling provides

a developer with detailed information on the routines and locations the application

executes. It is set up to collect statistics and write them to a file, and can be enabled

or disabled at runtime.

2.4.2 Modifications To POSE

We first modified POSE to generate stylus movements and button presses from an

activity log. During initialization POSE reads a parsed activity log file and divides

it into three groups — a list of events to be replayed synchronously with the tick

counter and two queues. One queue is for KeyCurrentState key bit fields and

the other is for random seeds (generated by non-zero SysRandom calls). We use

Gremlins as a template to execute the synchronous list of events (i.e., pen down, pen

up and button presses).

To ensure the correct timing of the synchronous events, the emulated system’s

tick counter is checked to see if it is greater than or equal to the tick timestamp of

20

the next event. If it is time for the next event, the emulator simulates the event

and the emulator continues to execute the same instructions as the handheld. The

emulator then continues execution until the next event is scheduled to occur. This

process continues until the synchronous event log is exhausted.

The KeyCurrentState key bit field and random seed queues are utilized when-

ever their respective system calls are made. To assure proper execution, the emulator

handles calls to the system routines KeyCurrentState and SysRandom di!er-

ently. For the routine KeyCurrentState, the emulator executes the function, then

looks up the appropriate key bit field to return based on the emulated tick timer and

the queue elements’ tick timestamps. The emulator handles calls to SysRandom in

much the same way when its parameter is non-zero. The di!erence is the seed value

from the queue is queried before SysRandom is called. The parameter is overwritten

with the seed value from the queue and execution continues. When the parameter for

SysRandom is zero, SysRandom is executed as normal, returning the next random

number from the existing seed’s sequence.

We further modified POSE to track and output statistical execution information

such as opcodes and memory references. To record the opcodes, we treated each exe-

cuted opcode as an index into an array, and incremented the respective array element.

Furthermore, we modified Profiling functions to track the memory references. These

values can be written to a file. We modified POSE to turn Profiling on and o! upon

starting and stopping Gremlins. When Profiling is enabled, POSE’s native speed

optimizations are ignored in favor of the original Palm OS code. A simple example of

this involves the TrapDispatcher. A Palm OS complier inserts Trap instructions

for the system to handle system calls. When a Trap instruction is executed, and Pro-

filing is enabled, the “TrapDispatcher looks at the program counter that was pushed

onto the stack, uses it to fetch the dispatch number following the TRAP opcode, and

21

uses the dispatch number to jump to the requested ROM function. [When Profiling

is disabled,] the emulator essentially bypasses the whole TrapDispatcher function and

. . . does the same operations with native x86, PPC, or 68K code” [33]. If Profiling

were not enabled, the emulator will have skipped executing several instructions that

a physical device would have, invalidating the collected data.

2.4.3 Simulator Setup

To use the simulator to playback a session, we first specify the filename of the

activity log to use in a plain text file. Next, we run the emulator. After the emulator

is finished initializing, we import all of the applications and databases corresponding

with the initial state of the specified session. We then reset the emulator to get it

into the same processor state as when the activity log started. Finally, we tell the

emulator to start Gremlins which begins the replay of events.

2.4.4 Limitations

Although our simulator works extremely well for workloads with state-based ap-

plications, choosing the Palm OS Emulator for playback does come with some limi-

tations with respect to timing. POSE simulates the processor’s tick counter but not

the RTC. We approximated the RTC by using the amount of time elapsed on the

emulator’s host machine (i.e., a desktop) since the last event. All events contain a

RTC timestamp. Furthermore, due to approximating the RTC and other fine-grained

timing di!erences (e.g., memory reads and interrupts) the combination of POSE and

our methods are not appropriate for timing-sensitive applications.

The remaining limitations originate from the failure to fully implement functions

found on handheld devices such as the serial port and memory card. These limitations

are addressed in Section 5.1, Future Work.

By using the Palm OS Emulator to replay activity logs and collect data, we

22

gather information about the workload on the handheld. POSE is used to replay

user’s activity because it is accurate and easily distributable.

23

Chapter 3

System Validation

In the previous chapter, we discussed the deterministic state machine model. This

model guarantees that equivalent deterministic state machines that start in the same

state and have the same inputs applied follow the same execution path and end in

the same state. To validate the accuracy of our trace-driven simulator we employed a

two-fold approach. First, we compared the inputs recorded on a Palm OS device while

executing a test workload with the inputs recorded during playback on the emulator.

Next, we correlated the handheld’s final state and that of the final emulated state.

In both cases we used three di!erent test workloads for validation.

3.1 Test Setup

We performed both of the validation tests on a Palm m515 with a 33 MHz Motorola

Dragonball MC68VZ328 processor and 16 MB of RAM. We utilized X-Master [27]

to manage the hacks (see Section 2.3.2). We loaded three additional applications to

prepare the handheld to collect activity logs. The first two applications we wrote

to create a common database (the activity log) and to set all the backup bits of

the databases. The third application, FileZ, a file manager, is produced by nosleep

software [34] and is widely distributed as freeware.

The initial state for the first test workload mirrors the default factory settings for

25

the system, except for the presence of the above mentioned applications and database.

The initial state of the second test workload is the same as the final state for the first

test workload. Also, the initial state of the last workload is the same as the final state

of the second workload.

To begin the simulations, we loaded the simulator with the initial state by import-

ing the applications and databases for the respective session and reset the simulator.

To begin playback, we invoked Gremlins. After the simulator replayed the last event

in the activity log, we stopped Gremlins. To obtain the activity log and the final

emulated state, we HotSynced the emulated system.

3.2 Test Workloads

The following three test workloads were setup as defined by the previous section.

Workloads 1 and 2 followed a predefined script of actions. Workload 3 illustrates

playing a game.

Workload 1

The first test involved entering a moderate amount of text into Palm’s Memopad

application. The general format was inspired by a volunteer user. The text entered

into Memopad was exactly as follows:

Credit Card Purchases

Date Item Amount

3/5 Milk $3.17

3/6 Head Lights $26.55

3/7 Pizza $5.36

3/8 Pop $.73

3/8 Safety Inspections $30.00

3/9 Food $14.54

3/9 Gas $14.44

3/10 Friend $8.00

3/10 Calling Card $10.00

3/13 Juice $2.70

3/13 Gas $18.00

26

Figure 3.1: Palm’s Puzzle Game.

Workload 2

Workload 2 exercised Palm’s four basic applications, Datebook, Address, To Do

and Memopad. First, a Datebook entry for June 18, 2004 entitled Last day to

schedule thesis defense was entered. An alarm was also set to go o! 15 days

before the event. Next, an address record was entered with the following information:

Title: Computer Science DepartmentCompany: Brigham Young University

Main Phone: (801) 422-3027

Fax: (801) 422-0169Address: 3361 TMCB PO Box 26576

City: Provo

State: UtahZip: 84602-6576

Then, a To Do item with the label of Graduate was entered with a due date of 7/2/04.

Finally, a memo containing the following text was entered:

To err is human, but to really foul things up requires a computer.

- Farmers’ Almanac, 1978

Workload 3

Workload 3 illustrates the simulator’s ability to replay some activity logs with

random number calls. For this test, we solved a game of Puzzle. Puzzle is bundled

27

with the Palm Desktop Software [35]. The game consists of 15 squares numbered one

to 15 and an open slot confined in a square frame (see Figure 3.1). The object of

the game is to arrange the squares in ascending order. The squares are moved by

pressing on them and dragging the stylus to the desired location. A new randomly

arranged game is started by pressing the New Puzzle button.

3.3 Activity Log Correlation

To validate the simulator, we first verified that the inputs collected from the

physical device were replayed on the simulator. To do this, we compared the activity

log generated while executing a test case and the activity log created during playback.

To obtain the activity log on the simulator, we imported our hacks and X-Master along

with the other applications and databases. Since the simulator accurately emulates

the handheld system, the hacks were executed just as they did on the handheld.

The activity log from the handheld and that of the emulated session correlate

very well. Each pen event recorded in the original activity log also appeared in the

emulated activity log with the same coordinates. Furthermore, all of the button

events were accurately replayed. However, the events in the emulated activity log

sometimes occurred in short bursts. A burst of events would occur slightly behind

schedule (< 20 ticks), and then return to being replayed at exactly the right tick

count. The bursts are believed to be due to the di!erent threads of the simulator

running at di!erent times, delaying the core processing thread momentarily. For this

same reason, the KeyCurrentState events are at times slightly o! in the emulated

activity log. Although the activity logs from the initial session and the emulated

session do not perfectly match, they contain virtually the same inputs, retaining the

integrity of the log.

28

3.4 Final State Correlation

To further validate the simulator, we compared the final state of a test session on

a handheld and the final state of the emulated session. The correlation of the two

final states is not a holistic validation but reflects the aggregate of the events. Since

the final state of a session is largely determined by the state of memory, we analyzed

the databases that were transferred to the desktop via a HotSync at the end of a

session.

On a Palm OS device, applications are stored internally in the same format as

record databases. Each database starts with a header, followed by the indexes of

the records, and then the records themselves. The only structural di!erence of an

application database and a record database is the presence of one or more code (or

other) resources.

To quantify the di!erences between the final state of the handheld session and

that of the emulated session, we compared the respective databases field by field.

The databases correlated extremely well. The only exceptions are three fields enti-

tled creationDate, lastBackupDate and modificationDate and the database

named psysLaunchDB. The three fields represent the number of seconds since 12:00

A.M. on January 1, 1904 that the database was either created, backed-up, or modi-

fied, respectively. These three fields regularly di!er between the two final states. The

di!erences are attributed to the procedure of importing and exporting the databases

to and from the simulator. The creationDate field for databases on the simu-

lator was always zero, since the databases were imported instead of created. The

lastBackupDate field was also always zero for the same databases because they

were never backed-up on the emulated system. The modificationDate field for

these databases was also zero unless it had a timestamp corresponding with the date

of replay. Exporting the databases for the emulated session caused some databases

29

to record the time of replay as the modificationDate. The di!erences in the

timestamps in the database headers do not adversely e!ect the performance of the

simulator.

The psysLaunchDB database is used solely by the operating system and has an

unpublished format. We estimate from its name and raw contents that it stores

information about applications that can be run from the home screen. The few single

byte di!erences between the records of the two databases are also attributed to the

procedure of loading databases into the simulator. We estimate that the execution of

the system is not adversely a!ected from the di!erences in psysLaunchDB.

From our validation, the simulator replays activity logs very accurately. The

very few di!erences are determined to not a!ect the integrity of the simulation, thus

making it suitable for performance and analysis studies.

30

Chapter 4

Cache Case Study

This chapter presents an in-depth case study analyzing the e!ects on memory perfor-

mance from adding di!erent caches to a Palm OS device. In this chapter we present

what we believe to be the first cache simulation results for devices running Palm OS.

4.1 Introduction

Over the past decades, the gap between processor and memory performance has

widened (see Figure 4.1). Hennessy and Patterson report that processor performance

has increased annually by 55% since 1986 while memory lags behind with an increase

of 7% per year [36]. To reduce this gap, memory hierarchies have been designed to

take advantage of locality to maximize performance and minimize cost.

A simple memory hierarchy is shown in Figure 4.2. The memory closest to the

CPU is referred to as a cache. A cache contains a subset of the information found in

memory but is designed to respond more quickly.

When the processor requests the contents of a memory location from a cache, and

that information is found, it is called a hit. Conversely, if the contents are not found

in the cache, it is called a miss. When a miss occurs, a block is read from memory

placed in the cache, and returned to the processor. A block is typically in the tens of

bytes, and contains the contents of the desired location and adjacent addresses.

31

Figure 4.1: CPU And Memory Performance (Data From Hennessy And Patter-son [36]).

Figure 4.2: A Simple Memory Hierarchy.

As shown in Figure 4.3, the memory address is divided into sections to locate the

desired block and the o!set within that block in the cache. The least significant bits

define the o!set within a block. The middle bits of the address specify which set

to look in, and are called the index bits. The most significant bits of the address

are the block’s ID. This part of the address is referred to as the tag. A cache has a

corresponding entry for every block. This allows a cache to determine if a given block

in the right set of a cache is the desired block.

When a cache miss occurs, an existing block must be displaced to make room for

32

Figure 4.3: Components Of A Cache Address.

Figure 4.4: Palm Memory Hierarchy.

the requested block. The most common algorithm for selecting the block to replace

is the least-recently used (LRU) algorithm.

Average e!ective memory access time is used to quantify the performance of var-

ious cache configurations and is described by the following equation:

Te! = Thit + MR · Tmiss (4.1)

Thit is the time (in CPU clock cycles) that a cache requires to service a cache hit.

MR, the miss rate of the cache, specifies the percentage of requests that result in a

miss. Tmiss is the time (in CPU clock cycles) that a memory hierarchy requires to

deliver the desired block on a cache miss.

Since Palm OS devices have both RAM and flash memory (see Figure 4.4) the

average e!ective memory access time for a Palm OS device is computed with the

following equation:

Te! = Thit + MR!

REFRAM

REFtotal

TRAM miss +REFflash

REFtotal

Tflash miss

"

(4.2)

where REFtotal = REFRAM + REFflash. TRAM miss and Tflash miss are the CPU cycles

33

Session RAM Refs Flash Refs Events Real Time Run Time Ave Mem Cyc

1 213,732,035 443,329,207 1243 24:34:31 7:01 2.35

2 31,261,532 68,897,235 933 48:28:56 4:47 2.38

3 33,670,308 76,069,375 755 24:52:55 3:56 2.39

4 234,378,096 485,808,992 1622 141:27:26 8:44 2.35

Table 4.1: Volunteer User Session Data.

required to fetch a block from the RAM and flash respectively. REFRAM and REFflash

are the number of RAM and flash references.

The following equation is used to calculate the average e!ective memory access

time for a system without a cache:

Te! =REFRAM

REFtotal

TRAM miss +REFflash

REFtotal

Tflash miss (4.3)

With a handheld device, not only is performance important, but power consump-

tion is as well. The system performance of a device is the time required to perform a

task. With a handheld, power consumption defines the amount of time a device can

be used. Studies have also been done to show that adding a cache not only increases

performance but can reduce the battery consumption for portable devices [37].

4.2 Experimental Setup

Table 4.1 summarizes the four sessions used in our cache simulations. Events refers

to the number of entries in each of the activity logs. Real Time is the amount of time

(in HH:MM:SS) each session spans. Run Time is the period of time (in MM:SS) that

the processor was active (either running or dozing, but not sleeping). Ave Mem Cyc

is the average e!ective memory access time (in cycles), without a cache as computed

with Equation 4.3.

The volunteer user operated a Palm m515 device with 16 MB of RAM, 4 MB of

flash and a 33 MHz Dragonball MC68VZ328 processor (which does not have a cache).

34

Before each session, we initialized the device and collected the necessary information

for the simulator, as described in Chapter 2. While replaying a session with the

emulator, we collected a log of the memory requests. We used these logs, or traces,

with a cache simulator that we wrote to obtain cache miss rates. We simulated 56

di!erent cache configurations by varying the cache size, line size and associativity.

The LRU replacement policy was used in every configuration.

4.3 Cache Simulation Results

Figure 4.5 illustrates miss rates for the 56 cache configurations of one session.

These results are typical of the other sessions in Table 4.1. Caches with a line size

of 32 bytes performed better than those with 16 byte lines except for the largest

cache sizes simulated with 4 and 8 way set associativities. Furthermore, increasing

the associativity typically decreases the miss rate.

The Dragonball MC68VZ328 requires one cycle for RAM accesses and three cycles

for flash accesses [38]. With the flash contributing about two thirds of the total

references, the average e!ective memory access time is closer to that of a flash access

time than a RAM access time (see Table 4.1). We used Equation 4.2 with Thit equal

to one cycle and TRAM miss and Tflash miss equal to one and three cycles respectively to

compute the average e!ective memory access times. Figure 4.6 displays the results for

the same 56 cache configurations. In all configurations, adding a cache significantly

reduces the average memory access time.

The small cache sizes used in this study exhibit the same miss rate trends found in

larger caches used in desktop systems. Figure 4.7 shows the miss rates for a desktop

system, exemplifying these trends. This sample comes from the Trace Distribution

Center at Brigham Young University, a repository of desktop and server traces.

35

4.4 Conclusion

Average memory access times for current Palm OS workloads su!er from requests

to the flash dominating those to the RAM, which require three times as many cycles

to complete. Our cache simulations show that even relatively small caches can reduce

the e!ective memory access time by 50% or more! This is mostly due to the flash

memory receiving the majority of references. Adding a cache to a Palm OS device

using a Dragonball MC68VZ328 processor can greatly reduce the average e!ective

memory access time and potentially reduce the battery consumption.

36

Figure 4.5: Miss Rates For 56 Cache Configuration.

37

Figure 4.6: Average E!ective Memory Access Times.

38

Figure 4.7: Miss Rates For A Desktop Address Trace.

39

Chapter 5

Conclusion

Hardware prototypes are expensive to build. To minimize the overall development

cost of new hardware, system architects rely on simulation to guide development

choices. Trace-driven simulators provide more accurate performance estimates than

other simulation approaches.

We have designed a trace-driven simulator for Palm OS devices. First, the initial

state is collected from a handheld device, then the device is instrumented to record

external inputs with five di!erent hacks. The log of external inputs, or activity log,

is collected while an actual user is running real applications under normal operating

conditions. After transferring the activity log to a desktop system, we replayed it on

an equivalent system, instrumented to record opcode usage, memory references and

performance statistics. With this data, tests such as energy consumption and cache

simulations can be realistically and accurately performed. Furthermore, our cache

simulations show that adding even a small cache memory to a Palm OS device can

greatly reduce the average e!ective memory access time.

5.1 Future Work

While our simulator provides statistical performance metrics for the most com-

monly used features of Palm OS devices, it does not currently replay activity logs

41

that involve memory cards, IrDA or serial port activity, resets or battery informa-

tion. Each of these features have been left for future work and would allow a more

diverse group of sessions to be replayed with the simulator. Furthermore, a more

comprehensive study of usage behavior with more users over a longer time frame

would provide more accurate estimates for cache simulations and other such tests.

42

LIST OF REFERENCES

[1] J. C. Mogul and A. Borg, “The e!ect of context switches on cache performance,”

in Proceedings of 4th International Conference on Architect Support for Program-

ming Language and Operating Systems. ACM, 1991, pp. 75–84.

[2] A. Borg, R. E. Kessler, and D. W. Wall, “Generation and analysis of very long

address traces,” in Proceedings of 17th International Symposium on Computer

Architecture. ACM, 1990, pp. 270–279.

[3] O. A. Olukotun, T. N. Mudge, and R. B. Brown, “Implementing a cache for

a high-performance gaas microprocessor,” in Proceedings of 18th International

Symposium on Computer Architecture. ACM, 1991, pp. 138–147.

[4] D. A. Wood, M. D. Hill, and R. E. Kessler, “A model for estimating trace-sample

miss ratios,” in Proc. of 1991 ACM Sigmetrics. ACM, 1991, pp. 79–89.

[5] S. Przybylski, M. Horowitz, and J. Hennessy, “Characteristics of

performance"optimal multi-level cache hierarchies,” in Proceedings of 16th

Annual International Symposium on Computer Architecture. IEEE, 1989.

[6] J. K. Flanagan, B. E. Nelson, J. K. Archibald, and K. Grimsrud, “Incomplete

trace data and trace driven simulation,” in Proceedings of the International

Workshop on Modeling, Analysis and Simulation of Computer and Telecommu-

nication Systems MASCOTS. SCS, 1993, pp. 203–209.

43

[7] K. Grimsrud, J. K. Archibald, B. E. Nelson, and J. K. Flanagan, “Estimation of

simulation error due to trace inaccuracies.” IEEE Asilomar Conference, 1992.

[8] A. Borg, R. E. Kessler, G. Lazana, and D. W. Wall, “Long address traces from

risc machines: generation and analysis,” Research Report 89/14, Sept 1989.

[9] R. M. Fujimoto and W. C. Hare:, “On the accuracy of multiprocessor tracing

techniques,” Georgia Institute of Technology, Report GIT"CC"92"53, June

1993.

[10] T. Kort, R. Cozza, L. Tay, and K. Maita, “Palmsource and microsoft tie for 1q04

pda market lead,” Gartner Dataquest, Tech. Rep., April 2004.

[11] C. Bey, E. Freeman, G. Hillerson, J. Ostrem, R. Rodriguez, and

G. Wilson, “Palm OS Programmer’s Companion Volume I,” Available:

http://www.palmos.com/dev/support/docs/ [accessed July 2001], 1996-2001.

[12] ——, Palm OS Programmer’s Companion Volume I, PalmSource, 1996-2001.

[13] R. Gannamaraju and S. Chandra, “Palmist: A tool to log palm system activity,”

in Proceedings of IEEE 4th Annual Workshop on Workload Characterization

WWC4, Dec. 2001, pp. 111–119.

[14] C. Rose and K. Flanagan, “Complete instruction traces from incomplete address

traces (CITCAT),” Computer Architecture News, vol. 24, no. 5, pp. 1–8, Dec

1996.

[15] C. Rose, “CITCAT: Constructing Instruction Traces from Cache-filtered Address

Traces,” Master’s thesis, Brigham Young University, April 1999.

44

[16] J. Flinn and M. Satyanarayanan, “Powerscope: A tool for profiling the energy

usage of mobile applications,” in Workshop on Mobile Computing Systems and

Applications (WMCSA), Feb 1999, pp. 2–10.

[17] N. Vijaykrishnan, M. T. Kandemir, M. J. Irwin, H. S. Kim, and

W. Ye, “Energy-driven integrated hardware-software optimizations using

SimplePower,” in ISCA, 2000, pp. 95–106. [Online]. Available: cite-

seer.nj.nec.com/vijaykrishnan00energydriven.html

[18] W. Ye, N. Vijaykrishnan, M. T. Kandemir, and M. J. Irwin, “The

design and use of simplepower: a cycle-accurate energy estimation tool,”

in Design Automation Conference, 2000, pp. 340–345. [Online]. Available:

citeseer.nj.nec.com/ye00design.html

[19] S. Lee, A. Ermedahl, S. L. Min, and N. Chang, “An accurate instruction-level

energy consumption model for embedded RISC processors,” in LCTES/OM,

2001, pp. 1–10. [Online]. Available: citeseer.nj.nec.com/lee01accurate.html

[20] T. L. Cignetti, K. Komarov, and C. S. Ellis, “Energy estimation tools for the

palm,” in ACM MSWiM 2000: Modeling, Analysis and Simulation of Wireless

and Mobile Systems, Aug. 2000.

[21] PalmSource, Inc, Palm OS Emulator, Available:

http://www.palmos.com/dev/tools/emulator/ [accessed July 2001].

[22] Palm, Inc, “Palm support: Hotsync technology support index,” Available:

http://www.palm.com/us/support/hotsync.html [accessed 2003].

[23] C. Bey, E. Freeman, G. Hillerson, J. Ostrem, R. Rodriguez, and

G. Wilson, “Palm OS Programmer’s API Reference,” Available:

http://www.palmos.com/dev/support/docs/ [accessed July 2001], 1996-2001.

45

[24] E. Keyes, “The HackMaster Protocol,” Available:

http://www.daggerware.com/hackapi.htm [accessed 2001].

[25] ——, “Hackmaster v0.9,” Available: http://www.daggerware.net/hackmstr.htm

[accessed 2001].

[26] “Tealmaster system extensions manager,” Available:

http://www.tealpoint.com/softmstr.htm [accessed 2003], TealPoint Software,

2004.

[27] LinkeSOFT, “X-master 1.5 free extension (hack) manager,” Available:

http://linkesoft.com/xmaster/ [accessed 2002].

[28] N. Bridges, “Quartus Forth Manual,” Available:

http://www.quartus.net/products/forth/manual/specific.htm [accessed January

2004], 1998-1999.

[29] The Gra"ti Anywhere Page, Available: http://www.escande.org/palm/GrfAnywhere/

[accessed 24 May 2004].

[30] PalmSource Inc, “Palm os, palm powered handhelds, and 18,000 software appli-

cations,” Available: http://www.palmsource.com [accessed July 2001].

[31] G. Hewgill, “Copilot: Design and development,” Handheld Systems, vol. 6, no. 3,

May/June 1998.

[32] GNU General Public License, Free Software Foundation,

http://www.gnu.org/copyleft/gpl.html.

[33] K. Rollin, “The Palm OS Emulator,” Handheld Systems, vol. 6, no. 3, May/June

1998.

46

[34] “The nosleep software open source project,” Available:

http://nosleepsoftware.sourceforge.net/download.html [accessed April 2004],

September 2002.

[35] “Palm desktop software for windows,” Available:

http://www.palmone.com/us/support/downloads/win desktop.html [accessed

April 2004], palmOne, Inc., 1995-2001.

[36] J. L. Hennessy and D. A. Patterson, Computure Architecture: A Quantitative

Approach, 3rd ed. Morgan Kaufmann Publishers, 2003.

[37] J. Su, “Cache optimization for energy e"cient memory hierarchies,” Master’s

thesis, Brigham Young University, 1995.

[38] MC68VZ328 Integrated Processor User’s Manual, Motorola, 02 2000.

47

A TRACE-DRIVEN SIMULATOR FOR PALM OS DEVICES by Hyrum D. Carroll...

Documents

Transcript of A TRACE-DRIVEN SIMULATOR FOR PALM OS DEVICES by Hyrum D. Carroll...