A TRACE-DRIVEN SIMULATOR FOR PALM OS DEVICES by Hyrum D. Carroll...
Transcript of A TRACE-DRIVEN SIMULATOR FOR PALM OS DEVICES by Hyrum D. Carroll...
A TRACE-DRIVEN SIMULATOR FOR PALM OS DEVICES
by
Hyrum D. Carroll
A thesis submitted to the faculty of
Brigham Young University
in partial fulfillment of the requirements for the degree of
Master of Science
Department of Computer Science
Brigham Young University
September 2004
Copyright c! 2004 Hyrum D. Carroll
All Rights Reserved
BRIGHAM YOUNG UNIVERSITY
GRADUATE COMMITTEE APPROVAL
of a thesis submitted by
Hyrum D. Carroll
This thesis has been read by each member of the following graduate committee andby majority vote has been found to be satisfactory.
Date J. Kelly Flanagan, Chair
Date Charles D. Knutson
Date Bryan S. Morse
BRIGHAM YOUNG UNIVERSITY
As chair of the candidate’s graduate committee, I have read the thesis of Hyrum D.Carroll in its final form and have found that (1) its format, citations, and biblio-graphical style are consistent and acceptable and fulfill university and departmentstyle requirements; (2) its illustrative materials including figures, tables, and chartsare in place; and (3) the final manuscript is satisfactory to the graduate committeeand is ready for submission to the university library.
Date J. Kelly FlanaganChair, Graduate Committee
Accepted for the Department
David W. EmbleyGraduate Coordinator
Accepted for the College
G. Rex Bryce, Associate DeanCollege of Physical and Mathematical Sciences
ABSTRACT
A TRACE-DRIVEN SIMULATOR FOR PALM OS DEVICES
Hyrum D. Carroll
Department of Computer Science
Master of Science
Due to the high cost of producing hardware prototypes, software simulators are
typically used to determine the performance of proposed systems. To accurately
represent a system with a simulator, the simulator inputs need to be representative
of actual system usage. Trace-driven simulators that use logs of actual usage are
generally preferred by researchers and developers to other types of simulators to
determine expected performance.
In this thesis we explain the design and results of a trace-driven simulator for
Palm OS devices capable of starting in a specified state and replaying a log of inputs
originally generated on a handheld. We collect the user inputs with an acceptable
amount of overhead while a device is executing real applications in normal operating
environments. We based our simulator on the deterministic state machine model. The
model specifies that two equivalent systems that start in the same state and have the
same inputs applied, follow the same execution paths. By replaying the collected
inputs we are able to collect traces and performance statistics from the simulator
that are representative of actual usage with minimal perturbation.
Our simulator can be used to evaluate various hardware modifications to Palm OS
devices such as adding a cache. At the end of this thesis we present an in-depth case
study analyzing the expected memory performance from adding a cache to a Palm
m515 device.
ACKNOWLEDGMENTS
Many thanks are due to the individuals who contributed to this thesis in one form or
another. Satish Baniya helped in the development of the input collection methods.
Myles Watson lent his eyes to numerous problems and his mind to making the project
better. Kelly Flanagan has inspired me and kept me on track with his vision. My
wife, Melissa, has lent her daily support and encouragement throughout the entire
project. And ultimately God, for His inspiration and solutions.
TABLE OF CONTENTS
Section Page
1 Introduction 1
1.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Our Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 System Architecture 7
2.1 High-Level Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Initial State Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Activity Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 Handheld Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Hacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.3 Evaluation of Activity Logs . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.4 Summary of Activity Logs . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Playback of Activity Logs . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.1 The Palm OS Emulator . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.2 Modifications To POSE . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.3 Simulator Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 System Validation 25
3.1 Test Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Test Workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Activity Log Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Final State Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4 Cache Case Study 31
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 Cache Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5 Conclusion 41
5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
xiii
LIST OF TABLES
Table Page
4.1 Volunteer User Session Data. . . . . . . . . . . . . . . . . . . . . . . . . 34
xiv
LIST OF FIGURES
Figure Page
2.1 Application Of The Deterministic State Machine Model. . . . . . . . . . 8
2.2 How A Hack Works. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 The genericEventRecord Data Structure . . . . . . . . . . . . . . . 12
2.4 The penEventRecord Data Structure . . . . . . . . . . . . . . . . . . 13
2.5 The keyCurrentStateCallRecord Data Structure . . . . . . . . . 14
2.6 The randomCallRecord Data Structure . . . . . . . . . . . . . . . . 16
2.7 Average Overhead For The EvtEnqueueKey Hack. . . . . . . . . . . . . 17
2.8 Average Overhead For Each Hack. . . . . . . . . . . . . . . . . . . . . . 18
3.1 Palm’s Puzzle Game. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1 CPU And Memory Performance. . . . . . . . . . . . . . . . . . . . . . . 32
4.2 A Simple Memory Hierarchy. . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 Components Of A Cache Address. . . . . . . . . . . . . . . . . . . . . . 33
4.4 Palm Memory Hierarchy. . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.5 Miss Rates For 56 Cache Configuration. . . . . . . . . . . . . . . . . . . 37
4.6 Average E!ective Memory Access Times. . . . . . . . . . . . . . . . . . 38
4.7 Miss Rates For A Desktop Address Trace. . . . . . . . . . . . . . . . . . 39
xv
xvi
Chapter 1
Introduction
Before a new computer system is implemented in hardware, developers often use soft-
ware simulators to predict the performance of the new design. A detailed system
simulator subjects a virtual system to representative workloads for accurate evalua-
tion. Often a trace, or a sequence of system events recorded from an actual system,
is used as input to the simulator. Simulators that use traces for input are known as
trace-driven simulators.
Developers and researchers have demonstrated the utility of trace-driven simu-
lation for desktop and server systems. They have used this technique to evaluate
memory hierarchies, processor performance, etc. [1, 2, 3, 4, 5]. Rather than work-
ing on solutions to the well-known problems of collecting traces for desktops and
servers [6, 7, 8, 9], we undertook to collect traces in an interactive and mobile envi-
ronment.
In a mobile environment, the user is free to move around while still operating the
handheld and will use the device di!erently than a stationary computer. We observed
that mobile devices are typically used for short periods of time (less then a minute)
followed by short or long periods of inactivity, with occasional periods of heavy usage.
1
Often users will turn on the device just to look up a phone number or to check their
calendar. This type of sporadic usage is not characteristic of desktop use.
Since Palm OS is used on more handheld devices than any other operating sys-
tem [10], we focused on collecting traces for Palm OS devices. These devices are
handheld computers running the Palm OS, which has a “preemptive multitasking
kernel that provides basic task management” [11, 12]. Tens of millions of handhelds
around the world run Palm OS, with over a million units shipped in the first quarter
of 2004 alone [10].
To collect these traces is a multi-step process. First, we instrumented a Palm
OS handheld to record external inputs, referred as an activity log. Next, we transfer
this log to a desktop computer. Using a simulator that we developed, we replay
the collected activity logs. We use the terms replay and playback interchangeably
to refer to the simulator’s reproduction of an execution of events from an activity
log. Furthermore, we instrumented the simulator to collect traces and performance
metrics during playback.
One of the key issues with a monitoring system is the amount of perturbation
introduced. An ideal monitoring procedure would not perturb the system at all.
Quantifying the perturbation helps to determine how close to ideal a given monitoring
system is. In this thesis we quantify the amount of perturbation with the amount
of overhead introduced, where overhead is the di!erence in execution time between
the instrumented system and the original system. If the monitoring procedure alters
the way in which a user operates a system, for example, by perceivably increasing
the amount of time required to process inputs, the overhead is unacceptable. On the
other hand, if a system can be monitored in such a way that the user can not perceive
the introduction of the monitoring system, the amount of the overhead is acceptable.
2
1.1 Previous Work
Gannamaraju and Chandra’s work with Palmist [13] provided a foundation for
our work of collecting user’s inputs. They collected system activity on mobile hand-
helds by recording the occurrence of 80% (707 of the 880) of the Palm OS 3.5 system
calls with hacks. (In the Palm OS community, hacks explicitly refer to system exten-
sions, and are described in detail in Section 2.3.2.) Due to their collection technique,
Gannamaraju and Chandra were unable to generate a complete sequence of system
activity. Since they recorded the occurrence of most of the system calls, their method
introduced a large amount of overhead. They reported that the time required for each
system call to execute increased by two or more orders of magnitude with Palmist
running compared to a system not running Palmist. This amount of overhead is un-
acceptable. Furthermore, their technique requires significant memory requirements.
For example, they generated 1.34 MB of records on the handheld to perform a set of
tasks that requires about one minute of execution without Palmist running. Given
that the maximum amount of memory found on Palm OS devices for which they
did their work is between 8 and 16 MB, only a few minutes of execution could be
traced. Using an eternal memory device to increase storage capacity would increase
the already high amount of overhead.
Rose and Flanagan’s work with CITCAT (Constructing Instruction Traces from
Cache-filtered Address Traces) [14] inspired the model for our simulator. The CIT-
CAT procedure defines the mechanisms used to collect and replay an initial memory
state image and important events. Rose and Flanagan implemented CITCAT and
produced complete instruction traces of desktop workloads. Flanagan defined com-
plete instruction traces to be “those traces containing all CPU generated references
including those produced by interrupts, system calls, exception handlers, other su-
pervisor activities, and user processes” [6]. To do this, they collected the “state of
3
the processor, caches, main memory, ... hardware interrupts, DMA, and other asyn-
chronous or peripheral events that influence the processor” [15] with the aid of a
hardware monitor such as a logic analyzer. They then initialized the simulator with
the collected initial machine state image, and replayed the asynchronous events. Dur-
ing playback, they collected performance data and instruction and address traces. In
their paper, Rose and Flanagan demonstrated that CITCAT can collect complete
traces of desktop workloads. However, they used a large and expensive logic analyzer
which is impractical for real-time mobile collection.
External hardware monitors, such as logic analyzers and oscilloscopes, signifi-
cantly alter the user’s behavior by restricting the mobility of the user. To use an
external monitor requires connecting probes to various pins to gather data. Hand-
helds monitored with such equipment have to be handled very carefully to keep the
probes attached. Furthermore, attaching a logic analyzer or oscilloscope to a hand-
held would greatly restrict the mobility of the device.
Several other studies besides Palmist have been performed on handheld devices,
but have focused on evaluating the energy consumption and e"ciency. Flinn and
Satyanarayanan [16] collected information about handheld devices using oscilloscopes.
Cycle-accurate simulators [17, 18, 19] have also been used to estimate energy consump-
tion of handheld devices. For a simulator to produce accurate estimations, they need
to be fed input representative of real usage. Cignetti, Komarov and Ellis’ work [20] is
unique in that they utilize a modified version of the Palm OS Emulator. They limited
their studies to energy consumption estimates.
1.2 Our Solution
Our approach unites Rose and Flanagan’s CITCAT [14] and Gannamaraju and
Chandra’s Palmist [13]. We borrow Rose and Flanagan’s fundamental model of repre-
senting a computer system as a deterministic state machine and apply it to handheld
4
devices. They showed that determining the execution path of a state machine, or a
computer system, requires the initial state and sequence of inputs. We collect the
initial state of a mobile devices by storing its memory contents on a desktop com-
puter. To collect the sequence of inputs we use a collection technique similar to that
used by Gannamaraju and Chandra, but with orders of magnitude less overhead. To
illustrate the usefulness of our simulator, we present a case study demonstrating the
expected performance of adding cache memory to a handheld device. This case study
would not have been possible with previous data collection techniques for handheld
devices.
1.3 Thesis Outline
The rest of this thesis proceeds as follows: Chapter 2 explains the architecture of
our system by first describing it from a high-level perspective, and then elaborating
on the di!erent components. The collection of the initial state, activity logs and
their playback will all be covered in this chapter. Chapter 3 illustrates two di!erent
methods employed to validate the system. In the first method, we compare the
activity log of the user’s session and that of the emulated session. For the second
method, we compare the final state of an emulated session with the final state of
the handheld. In Chapter 4 we demonstrate the usefulness of the simulator and
present the expected cache memory performance for a Palm m515. The final chapter
concludes the thesis and describes future work.
5
6
Chapter 2
System Architecture
This chapter explains how we collect external inputs from a Palm OS device and
use this data as input to our trace-driven simulator. We first explain our model, in
general terms, then we describe our implementation of it. The model is divided into
three components. First, the model requires the initial state of the system (comprised
of the memory, processor, etc.). The second component is the sequence of inputs to
the system. We refer to the log of user inputs, along with the corresponding timing
information, as an activity log. The final component is an emulator which starts in
the initial state and processes the inputs to the system. We then explain and evaluate
the implementation of each of these components.
2.1 High-Level Overview
Since a computer processor is deterministic, we can use the deterministic state
machine model to represent it, as did Rose and Flanagan [15, 14]. First, let ! be an
initial state and " be a sequence of external inputs. The model specifies that every
time a deterministic state machine starts in ! and " is applied, the machine always
follows the same sequence of states and ends in the same final state. Furthermore,
if machine A and machine B are equivalent deterministic state machines, and both
7
Figure 2.1: Application Of The Deterministic State Machine Model.
start in !, and " is applied to both machines, then both machines A and B will follow
the same path of execution and end in the same state.
The deterministic state machine model applies to situations in which the workload
activity performed on system Suser is replicated on system Semulated (see Figure 2.1).
First, let Suser be a system instrumented to record the initial state and all external
inputs. Also, let Semulated, be an equivalent system (e.g., an emulator or simulator) of
Suser that can replay the events from Suser in an environment in which measurements
can be made. At the end of a session (the period of time that inputs are collected) we
transfer the initial state and collected inputs from Suser to Semulated. Then Semulated is
started in the initial state of Suser for that session. Semulated then processes and replays
the collected activity log from Suser. Performance measurements and workload traces
can be generated internally or externally on Semulated. Since statistics and data are
gathered on Semulated while the activity log is replayed, they represent the original
execution by the user without system perturbation.
For our implementation of the model, Suser is a handheld computer operated
by a human. We instrumented the handheld to collect the initial state and user
inputs (i.e., pen movements and button presses). We then transferred the data for a
8
session to a desktop computer. Semulated is a Palm OS device emulator that accurately
models handheld computer devices. We execute the emulator on a desktop computer
and use the initial state and activity log from Suser for playback. During playback,
the emulator follows the same sequence of states as the handheld. By applying the
deterministic state machine model to handhelds, we can collect the initial state and
inputs of a device and replay them with an emulator on a desktop computer.
The following steps represent the chronological order of the methods we used to
collect the relevant information from a handheld and replay it on an emulator:
• Instrument a handheld to collect user inputs• Transfer the initial state of a handheld to the desktop• Start collecting inputs• Allow the user to operate the handheld normally• Transfer the activity log from the handheld to the desktop• Load the emulator with the collected initial state of the handheld• Collect processor information while replaying the activity log on the desktop
We explain each of these steps in the following sections.
2.2 Initial State Collection
Before we give a handheld to a volunteer user, we instrument it to collect user
inputs, capture its initial state, and transfer the state information to a desktop com-
puter. As described above, an initial state is necessary to recreate accurate playback
of a session. For an initial state to be accurate, it must be complete. For a handheld,
the initial state includes the contents of the flash memory, the static RAM and the
current state of the processor (current point of execution and registers).
We use ROMTransfer.prc, an application freely distributed by PalmSource [21],
to transfer a flash image from a Palm OS device to a desktop with a USB cable. To
get the necessary contents of the RAM, we set the backup bit for all the applications
and databases, and perform a HotSync [22]. HotSyncing synchronizes the programs
and databases on a handheld with a copy on the desktop with a USB cable.
9
We chose to start every session directly after a soft reset. A soft reset on Palm OS
devices “reinitialize[s] the globals and the dynamic memory heap” [23]. Choosing a
common execution point freed us from recording the registers values and consequently,
writing these values in the emulated registers at precisely the beginning of a session.
Instead, we perform a soft reset that initiates a deterministic sequence of actions that
populates the registers.
A complete initial state consists of the flash image, all programs and databases
from RAM, and the current point of execution. This information is transferred to the
desktop at the beginning of a session.
2.3 Activity Logs
The deterministic state machine model specifies that in addition to the initial
state, a complete sequence of external inputs, or activity log, is needed to accurately
replay a session. For an activity log to be complete for a handheld, all forms of inputs
must be considered. For this reason, this section begins by discussing inputs to
handhelds. Next, our chosen method of data collection, hacks, is explained, detailing
the five hacks that we wrote to collect user inputs. This section concludes with an
evaluation of activity logs.
2.3.1 Handheld Inputs
Handheld computers are naturally interactive and allow for di!erent forms of
input from users. To produce a complete activity log, all relevant input needs to be
recorded. A typical handheld allows for the following inputs:
• Digitizer (touch screen)
• Buttons
• Real Time Clock (RTC)
• Memory Card
• IrDA/Serial Port
• Reset Button
10
Figure 2.2: How A Hack Works.
Of these inputs, we collectively record stylus movements on the digitizer (touch
screen), button presses (up, down, power, HotSync cradle and four application but-
tons) and the real time clock (RTC). These three input types account for the majority
of input on handhelds. Also, the insertion, removal, and name of a memory card can
be detected with our technique. We have chosen not to use memory cards in this
study due to the extra complexity and requirements in storing activity logs and simu-
lation. Due to time constraints and an expected increase in overhead, IrDA and serial
port activity are not collected. For simplicity, the reset event was not implemented in
the simulator. Recording the event of a system turning o! and on again has inherit
problems that we have left for future work.
2.3.2 Hacks
In the Palm OS community, the word hacks has a specific meaning. A hack is a
section of code contained in a routine that is “called in addition to or in lieu of the
standard Palm OS routines” [24]. By default, at the time a system routine is called,
11
Figure 2.3: The Record Structure For Button, Pen Up And Notification Events.
the operating system looks up its address in the trap dispatch table, jumps to that
location and continues execution. After the system routine finishes, execution contin-
ues in the application that called the routine. Inserting the address of a hack into the
trap dispatch table forces the system to call the hack instead of the default system
routine (see Figure 2.2). Since the system passes the same parameters to the hack
that it would have if it were calling the default system routine, the hack must accept
the same arguments as the system routine for proper system execution. Typically,
a hack either starts or ends by calling the default routine, and then returning the
result of that call. Often a hack manager, such as HackMaster [25], TealMaster [26]
or X-Master [27] is employed to manage modifications to the trap dispatch table. A
hack manager also provides a user interface to activate, deactivate and arrange the
order of hacks.
Unlike Palmist, which uses a hack for every system call, our simulator only requires
one hack for every system routine that processes user input. We wrote five hacks to
patch the following system routines: EvtEnqueueKey, EvtEnqueuePenPoint,
KeyCurrentState, SysNotifyBroadcast and SysRandom. Each of these
hacks opens a common database, inserts a record with the current tick counter and
the real time clock values, the event type and any necessary data. It then closes the
common database. Each hack also makes a call to the original system routine. We
now explain each of these five hacks in greater detail.
12
Figure 2.4: The Record Structure For A Pen Event.
EvtEnqueueKey Hack
We wrote a hack for the EvtEnqueueKey system routine to record the time
at which a user pressed a button. After the system handles an interrupt generated
by a button press, the event is placed in a queue by the EvtEnqueueKey system
routine. The up, down, power, HotSync cradle and four application buttons are all
enqueued with this function. We replaced the address of the EvtEnqueueKey
system routine in the system trap table with the address of our hack. Our hack
stores a record on the handheld containing information about the button that was
pressed. Figure 2.3 describes the records structure used. timeStamp represents the
tick counter (with a resolution of 10 msec). RTCtime is a copy of the RTCTIME
Dragonball registers and contains the hours, minutes and seconds of the RTC (real
time clock). dayInHours defines the calendar date, by recording the number of
hours since January 1, 1904. eventType records the type of event for the record.
Before our hack finishes, it passes the parameters it received to the EvtEnqueueKey
system routine.
EvtEnqueuePenPoint Hack
We wrote another hack for the EvtEnqueuePenPoint system routine. This
hack records pen movements, including pen up events (generated when the user stops
pressing on the screen). PalmSource classifies the EvtEnqueuePenPoint routine
as an undocumented system use only function. The Quartus Forth Manual states that
13
Figure 2.5: The Record Structure For A Call To KeyCurrentState.
it takes a PointType pointer and has a return type of Err [28]. After some trial
and error verification of recording stylus movements, we determined that the Point-
Type pointer contains the digitizer coordinates of the stylus. Digitizer coordinates
vary from device to device in relationship to the objects displayed. Recalibrating
a device changes the relationship of the sampled location to the objects displayed.
Pixel coordinates, on the other hand, represent the location of the pixel on the screen.
They do not vary from device to device in relation to the objects on the screen. Our
hack calls the Palm OS system routine PenRawToScreen to convert digitizer co-
ordinates to pixel coordinates. This information, along with the timestamps, are
stored in a penEventRecord. Figure 2.4 illustrates the penEventRecord. The
variables with the same names as in a genericEventRecord have the same mean-
ings. Additionally, x and y are pixel coordinates. A pen up event does not contain
coordinate information. To conserve memory, we use a genericEventRecord (see
Figure 2.3) to record the time that this event occurred.
KeyCurrentState Hack
Although our hack for EvtEnqueueKey records the time at which a button
is initially pressed, another hack is needed to capture the duration a user pressed
the button. To accomplish this, we wrote a hack that records the value returned by
KeyCurrentState in a keyCurrentStateCallRecord. Figure 2.5 details the
structure of a keyCurrentStateCallRecord. In the record, the keyBitField
14
is the value returned from KeyCurrentState, representing the buttons that are
currently pressed. All other values in the records have the same representations as
before. The Palm OS system routine KeyCurrentState returns a bit pattern
representing the buttons that are currently being pressed. Each bit represents one
of the eight buttons on a Palm OS handheld (power, up and down, four application
buttons and the cradle or cable HotSync button).
SysNotifyBroadcast Hack
To capture all of the user inputs, we wrote a hack to record the relevant notifica-
tions sent by the SysNotifyBroadcast system routine. SysNotifyBroadcast,
as its name suggests, broadcasts a system notification to all applications registered
for that broadcast [23]. Notifications are used as an internal messaging protocol and
include notifications for events such as the insertion or removal of a volume (i.e.,
memory card), the deletion of a database, or the transition of the system to a sleep
state.
The SysNotifyBroadcast routine sends a sysNotifyLateWakeupEvent
notification when a user turns the system on. Similarly, it broadcasts a sysNoti-
fySleepNotifyEvent notification when the user turns the device o!. When a user
presses a button to turn a device on, the system executes a di!erent sequence path
than when the device is turned o!. This type of button press is not enqueued by the
EvtEnqueueKey routine. For this reason, our hack records the occurrence of a
sysNotifyLateWakeupEvent notification, storing this information in a gener-
icEventRecord (see Figure 2.3). We chose to record the event of a user turning
the device o! with our hack for EvtEnqueueKey instead of with our hack for
SysNotifyBroadcast to simplify the logic in the EvtEnqueueKey hack.
15
Figure 2.6: The Record Structure For A Call To SysRandom.
SysRandom Hack
Our final hack records every time the operating system re-seeds the random num-
ber generator. Although re-seeding is not a user input, we discovered that it is
necessary to record the new seed value, to replay some sessions that call SysRan-
dom, due to minuscule timing di!erences. The operating system seeds and re-seeds
the random number generator with calls to SysRandom.
SysRandom, is a Palm OS system routine that returns a random number between
zero and sysRandomMax [23]. If it is called with a non-zero value, the parameter
is interpreted as a new seed and it returns the first random number in that sequence.
Calling it with zero returns a random number using the existing seed.
As with all the other hacks, this one records the event’s timestamps and signifies
its type in a randomCallRecord. Figure 2.6 details the randomCallRecord
record structure, with the seed representing a non-zero seed value passed to Sys-
Random. Zero seed value calls to SysRandom do not need to be recorded since the
device will deterministically issue random numbers in a sequence based on the seed.
2.3.3 Evaluation of Activity Logs
The overhead introduced by collecting the activity logs, in terms of storage re-
quirements and execution time, is negligible. The individual records each consume
twelve (see Figures 2.3 and 2.4) or sixteen bytes (see Figures 2.5 and 2.6) on the
handheld.
16
Figure 2.7: Average Overhead For The EvtEnqueueKey Hack.
We found that volunteer users could operate a device normally for more than a
week without having the database reach the maximum number of records (65,536).
Gannamaraju and Chandra describe an implementation to overcome this barrier at
the expense of additional execution time [13]. If the database contains the maximum
number of the largest size records, it would require a total of 1536 KB of memory for
the records and the database header information.
We asked volunteer users to qualitatively indicate any general di!erences in exe-
cution times with hacks enabled and disabled. They reported no perceived di!erence
except for computationally intensive applications such as Gra"ti Anywhere [29].
We quantitatively measured the overhead of the EvtEnqueuePenPoint hack by
counting the number of pen events per second in the database with the stylus con-
tinuously pressed against the screen. The test was implemented on a Palm m515,
which samples pen movements 50 times a second. The database initially contained
no records. The device recorded an average of 50.0 pen events per second in the
database indicating no perceptible overhead for pen sampling.
We also designed a test that called a hack in a tight loop on a handheld to further
quantify the overhead. The test eliminated the call to the original system routine
17
Figure 2.8: Average Overhead For Each Hack.
to isolate the overhead associated with the hack. Figure 2.7 illustrates the average
execution time per call of the EvtEnqueueKey hack. This test revealed that the
overhead of each call is proportional to the size of the database. For example, the
average overhead for the EvtEnqueueKey hack when the database had zero to 10,000
records was 6.4 ms. However, when the database had 50,000 to 60,000 records, the
average overhead was 15.5 ms per call. Given that the position of the stylus is sampled
every 20 ms, the overhead introduced by a hack when the database is nearly full is
significant.
To keep the overhead at an acceptable level, we limited sessions to shorter time
periods (e.g., two to three days). Figure 2.8 depicts the average overhead of each
hack averaged over the first 30,000 iterations. Limiting sessions to two or three days
generally kept the number of records well below 30,000. The overhead varied only
slightly for each hack.
18
2.3.4 Summary of Activity Logs
We wrote five hacks to record activity logs containing user inputs on handheld
devices. These hacks record all pen and button activity produced by a user. Further-
more, our hacks have an acceptable amount of overhead.
2.4 Playback of Activity Logs
The initial state and an activity log provide very little information in and of them-
selves. Their purpose is to provide the necessary information to replay the workload
for the time period they were collected on a handheld. As stated in Section 2.1, an
equivalent system can be used to start in the same initial state and replay the same
inputs. This equivalent system can then be used to provide statistics and perfor-
mance information without disturbing the integrity of the original workload. Our
tool utilizes the Palm OS Emulator, POSE, as our equivalent system.
In this section, we first introduce POSE. We then explain the modifications we
made to it. Next, we provide details of how we setup our simulator for playback.
Finally, some limitations are described.
2.4.1 The Palm OS Emulator
The Palm OS Emulator, POSE [21], is a publicly available and widely used devel-
opment tool for handhelds running Palm OS [30]. POSE, maintained by PalmSource
Inc., is based on Greg Hewgill’s Copilot [31]. It runs on a desktop system, and
emulates Palm OS based systems (e.g., Handspring, Palm, and Symbol handhelds).
POSE does this by keeping track of the processor state and fetching instructions out
of a flash image and an allocated RAM. Stylus and button presses are simulated with
the mouse. The emulator generates an event record to process the input. This event
record is returned to the system when EvtGetEvent is called (typically in the
application’s event loop). Our simulator is a modified version of POSE 3.5 that is
instrumented to replay activity logs and collect performance data.
19
POSE is a suitable framework for playback for several reasons. First, POSE ac-
curately emulates the Palm OS and the supporting hardware. It fetches the same
instructions as a handheld and updates registers and memory banks just as a phys-
ical device does. Second, POSE is already a widely used and accepted tool in the
development community for Palm OS devices. Third, it is free and distributable,
licensed under the GNU General Public License [32]. Fourth, POSE also allows for
quicker development by building on an existing software base. Finally, the source
code for POSE is available for Windows, UNIX and Mac OS platforms.
The Palm OS Emulator o!ers many resources for profiling, debugging, and testing.
Two relevant functions to this project are Gremlins and Profiling. Gremlins provide
pseudo random events to flush out bugs in applications. A developer can setup
Gremlins to run for a specified number of events or indefinitely. When an error occurs,
Gremlins stops and the same sequence of events can be run again. Profiling provides
a developer with detailed information on the routines and locations the application
executes. It is set up to collect statistics and write them to a file, and can be enabled
or disabled at runtime.
2.4.2 Modifications To POSE
We first modified POSE to generate stylus movements and button presses from an
activity log. During initialization POSE reads a parsed activity log file and divides
it into three groups — a list of events to be replayed synchronously with the tick
counter and two queues. One queue is for KeyCurrentState key bit fields and
the other is for random seeds (generated by non-zero SysRandom calls). We use
Gremlins as a template to execute the synchronous list of events (i.e., pen down, pen
up and button presses).
To ensure the correct timing of the synchronous events, the emulated system’s
tick counter is checked to see if it is greater than or equal to the tick timestamp of
20
the next event. If it is time for the next event, the emulator simulates the event
and the emulator continues to execute the same instructions as the handheld. The
emulator then continues execution until the next event is scheduled to occur. This
process continues until the synchronous event log is exhausted.
The KeyCurrentState key bit field and random seed queues are utilized when-
ever their respective system calls are made. To assure proper execution, the emulator
handles calls to the system routines KeyCurrentState and SysRandom di!er-
ently. For the routine KeyCurrentState, the emulator executes the function, then
looks up the appropriate key bit field to return based on the emulated tick timer and
the queue elements’ tick timestamps. The emulator handles calls to SysRandom in
much the same way when its parameter is non-zero. The di!erence is the seed value
from the queue is queried before SysRandom is called. The parameter is overwritten
with the seed value from the queue and execution continues. When the parameter for
SysRandom is zero, SysRandom is executed as normal, returning the next random
number from the existing seed’s sequence.
We further modified POSE to track and output statistical execution information
such as opcodes and memory references. To record the opcodes, we treated each exe-
cuted opcode as an index into an array, and incremented the respective array element.
Furthermore, we modified Profiling functions to track the memory references. These
values can be written to a file. We modified POSE to turn Profiling on and o! upon
starting and stopping Gremlins. When Profiling is enabled, POSE’s native speed
optimizations are ignored in favor of the original Palm OS code. A simple example of
this involves the TrapDispatcher. A Palm OS complier inserts Trap instructions
for the system to handle system calls. When a Trap instruction is executed, and Pro-
filing is enabled, the “TrapDispatcher looks at the program counter that was pushed
onto the stack, uses it to fetch the dispatch number following the TRAP opcode, and
21
uses the dispatch number to jump to the requested ROM function. [When Profiling
is disabled,] the emulator essentially bypasses the whole TrapDispatcher function and
. . . does the same operations with native x86, PPC, or 68K code” [33]. If Profiling
were not enabled, the emulator will have skipped executing several instructions that
a physical device would have, invalidating the collected data.
2.4.3 Simulator Setup
To use the simulator to playback a session, we first specify the filename of the
activity log to use in a plain text file. Next, we run the emulator. After the emulator
is finished initializing, we import all of the applications and databases corresponding
with the initial state of the specified session. We then reset the emulator to get it
into the same processor state as when the activity log started. Finally, we tell the
emulator to start Gremlins which begins the replay of events.
2.4.4 Limitations
Although our simulator works extremely well for workloads with state-based ap-
plications, choosing the Palm OS Emulator for playback does come with some limi-
tations with respect to timing. POSE simulates the processor’s tick counter but not
the RTC. We approximated the RTC by using the amount of time elapsed on the
emulator’s host machine (i.e., a desktop) since the last event. All events contain a
RTC timestamp. Furthermore, due to approximating the RTC and other fine-grained
timing di!erences (e.g., memory reads and interrupts) the combination of POSE and
our methods are not appropriate for timing-sensitive applications.
The remaining limitations originate from the failure to fully implement functions
found on handheld devices such as the serial port and memory card. These limitations
are addressed in Section 5.1, Future Work.
By using the Palm OS Emulator to replay activity logs and collect data, we
22
gather information about the workload on the handheld. POSE is used to replay
user’s activity because it is accurate and easily distributable.
23
24
Chapter 3
System Validation
In the previous chapter, we discussed the deterministic state machine model. This
model guarantees that equivalent deterministic state machines that start in the same
state and have the same inputs applied follow the same execution path and end in
the same state. To validate the accuracy of our trace-driven simulator we employed a
two-fold approach. First, we compared the inputs recorded on a Palm OS device while
executing a test workload with the inputs recorded during playback on the emulator.
Next, we correlated the handheld’s final state and that of the final emulated state.
In both cases we used three di!erent test workloads for validation.
3.1 Test Setup
We performed both of the validation tests on a Palm m515 with a 33 MHz Motorola
Dragonball MC68VZ328 processor and 16 MB of RAM. We utilized X-Master [27]
to manage the hacks (see Section 2.3.2). We loaded three additional applications to
prepare the handheld to collect activity logs. The first two applications we wrote
to create a common database (the activity log) and to set all the backup bits of
the databases. The third application, FileZ, a file manager, is produced by nosleep
software [34] and is widely distributed as freeware.
The initial state for the first test workload mirrors the default factory settings for
25
the system, except for the presence of the above mentioned applications and database.
The initial state of the second test workload is the same as the final state for the first
test workload. Also, the initial state of the last workload is the same as the final state
of the second workload.
To begin the simulations, we loaded the simulator with the initial state by import-
ing the applications and databases for the respective session and reset the simulator.
To begin playback, we invoked Gremlins. After the simulator replayed the last event
in the activity log, we stopped Gremlins. To obtain the activity log and the final
emulated state, we HotSynced the emulated system.
3.2 Test Workloads
The following three test workloads were setup as defined by the previous section.
Workloads 1 and 2 followed a predefined script of actions. Workload 3 illustrates
playing a game.
Workload 1
The first test involved entering a moderate amount of text into Palm’s Memopad
application. The general format was inspired by a volunteer user. The text entered
into Memopad was exactly as follows:
Credit Card Purchases
Date Item Amount
3/5 Milk $3.17
3/6 Head Lights $26.55
3/7 Pizza $5.36
3/8 Pop $.73
3/8 Safety Inspections $30.00
3/9 Food $14.54
3/9 Gas $14.44
3/10 Friend $8.00
3/10 Calling Card $10.00
3/13 Juice $2.70
3/13 Gas $18.00
26
Figure 3.1: Palm’s Puzzle Game.
Workload 2
Workload 2 exercised Palm’s four basic applications, Datebook, Address, To Do
and Memopad. First, a Datebook entry for June 18, 2004 entitled Last day to
schedule thesis defense was entered. An alarm was also set to go o! 15 days
before the event. Next, an address record was entered with the following information:
Title: Computer Science DepartmentCompany: Brigham Young University
Main Phone: (801) 422-3027
Fax: (801) 422-0169Address: 3361 TMCB PO Box 26576
City: Provo
State: UtahZip: 84602-6576
Then, a To Do item with the label of Graduate was entered with a due date of 7/2/04.
Finally, a memo containing the following text was entered:
To err is human, but to really foul things up requires a computer.
- Farmers’ Almanac, 1978
Workload 3
Workload 3 illustrates the simulator’s ability to replay some activity logs with
random number calls. For this test, we solved a game of Puzzle. Puzzle is bundled
27
with the Palm Desktop Software [35]. The game consists of 15 squares numbered one
to 15 and an open slot confined in a square frame (see Figure 3.1). The object of
the game is to arrange the squares in ascending order. The squares are moved by
pressing on them and dragging the stylus to the desired location. A new randomly
arranged game is started by pressing the New Puzzle button.
3.3 Activity Log Correlation
To validate the simulator, we first verified that the inputs collected from the
physical device were replayed on the simulator. To do this, we compared the activity
log generated while executing a test case and the activity log created during playback.
To obtain the activity log on the simulator, we imported our hacks and X-Master along
with the other applications and databases. Since the simulator accurately emulates
the handheld system, the hacks were executed just as they did on the handheld.
The activity log from the handheld and that of the emulated session correlate
very well. Each pen event recorded in the original activity log also appeared in the
emulated activity log with the same coordinates. Furthermore, all of the button
events were accurately replayed. However, the events in the emulated activity log
sometimes occurred in short bursts. A burst of events would occur slightly behind
schedule (< 20 ticks), and then return to being replayed at exactly the right tick
count. The bursts are believed to be due to the di!erent threads of the simulator
running at di!erent times, delaying the core processing thread momentarily. For this
same reason, the KeyCurrentState events are at times slightly o! in the emulated
activity log. Although the activity logs from the initial session and the emulated
session do not perfectly match, they contain virtually the same inputs, retaining the
integrity of the log.
28
3.4 Final State Correlation
To further validate the simulator, we compared the final state of a test session on
a handheld and the final state of the emulated session. The correlation of the two
final states is not a holistic validation but reflects the aggregate of the events. Since
the final state of a session is largely determined by the state of memory, we analyzed
the databases that were transferred to the desktop via a HotSync at the end of a
session.
On a Palm OS device, applications are stored internally in the same format as
record databases. Each database starts with a header, followed by the indexes of
the records, and then the records themselves. The only structural di!erence of an
application database and a record database is the presence of one or more code (or
other) resources.
To quantify the di!erences between the final state of the handheld session and
that of the emulated session, we compared the respective databases field by field.
The databases correlated extremely well. The only exceptions are three fields enti-
tled creationDate, lastBackupDate and modificationDate and the database
named psysLaunchDB. The three fields represent the number of seconds since 12:00
A.M. on January 1, 1904 that the database was either created, backed-up, or modi-
fied, respectively. These three fields regularly di!er between the two final states. The
di!erences are attributed to the procedure of importing and exporting the databases
to and from the simulator. The creationDate field for databases on the simu-
lator was always zero, since the databases were imported instead of created. The
lastBackupDate field was also always zero for the same databases because they
were never backed-up on the emulated system. The modificationDate field for
these databases was also zero unless it had a timestamp corresponding with the date
of replay. Exporting the databases for the emulated session caused some databases
29
to record the time of replay as the modificationDate. The di!erences in the
timestamps in the database headers do not adversely e!ect the performance of the
simulator.
The psysLaunchDB database is used solely by the operating system and has an
unpublished format. We estimate from its name and raw contents that it stores
information about applications that can be run from the home screen. The few single
byte di!erences between the records of the two databases are also attributed to the
procedure of loading databases into the simulator. We estimate that the execution of
the system is not adversely a!ected from the di!erences in psysLaunchDB.
From our validation, the simulator replays activity logs very accurately. The
very few di!erences are determined to not a!ect the integrity of the simulation, thus
making it suitable for performance and analysis studies.
30
Chapter 4
Cache Case Study
This chapter presents an in-depth case study analyzing the e!ects on memory perfor-
mance from adding di!erent caches to a Palm OS device. In this chapter we present
what we believe to be the first cache simulation results for devices running Palm OS.
4.1 Introduction
Over the past decades, the gap between processor and memory performance has
widened (see Figure 4.1). Hennessy and Patterson report that processor performance
has increased annually by 55% since 1986 while memory lags behind with an increase
of 7% per year [36]. To reduce this gap, memory hierarchies have been designed to
take advantage of locality to maximize performance and minimize cost.
A simple memory hierarchy is shown in Figure 4.2. The memory closest to the
CPU is referred to as a cache. A cache contains a subset of the information found in
memory but is designed to respond more quickly.
When the processor requests the contents of a memory location from a cache, and
that information is found, it is called a hit. Conversely, if the contents are not found
in the cache, it is called a miss. When a miss occurs, a block is read from memory
placed in the cache, and returned to the processor. A block is typically in the tens of
bytes, and contains the contents of the desired location and adjacent addresses.
31
Figure 4.1: CPU And Memory Performance (Data From Hennessy And Patter-son [36]).
Figure 4.2: A Simple Memory Hierarchy.
As shown in Figure 4.3, the memory address is divided into sections to locate the
desired block and the o!set within that block in the cache. The least significant bits
define the o!set within a block. The middle bits of the address specify which set
to look in, and are called the index bits. The most significant bits of the address
are the block’s ID. This part of the address is referred to as the tag. A cache has a
corresponding entry for every block. This allows a cache to determine if a given block
in the right set of a cache is the desired block.
When a cache miss occurs, an existing block must be displaced to make room for
32
Figure 4.3: Components Of A Cache Address.
Figure 4.4: Palm Memory Hierarchy.
the requested block. The most common algorithm for selecting the block to replace
is the least-recently used (LRU) algorithm.
Average e!ective memory access time is used to quantify the performance of var-
ious cache configurations and is described by the following equation:
Te! = Thit + MR · Tmiss (4.1)
Thit is the time (in CPU clock cycles) that a cache requires to service a cache hit.
MR, the miss rate of the cache, specifies the percentage of requests that result in a
miss. Tmiss is the time (in CPU clock cycles) that a memory hierarchy requires to
deliver the desired block on a cache miss.
Since Palm OS devices have both RAM and flash memory (see Figure 4.4) the
average e!ective memory access time for a Palm OS device is computed with the
following equation:
Te! = Thit + MR!
REFRAM
REFtotal
TRAM miss +REFflash
REFtotal
Tflash miss
"
(4.2)
where REFtotal = REFRAM + REFflash. TRAM miss and Tflash miss are the CPU cycles
33
Session RAM Refs Flash Refs Events Real Time Run Time Ave Mem Cyc
1 213,732,035 443,329,207 1243 24:34:31 7:01 2.35
2 31,261,532 68,897,235 933 48:28:56 4:47 2.38
3 33,670,308 76,069,375 755 24:52:55 3:56 2.39
4 234,378,096 485,808,992 1622 141:27:26 8:44 2.35
Table 4.1: Volunteer User Session Data.
required to fetch a block from the RAM and flash respectively. REFRAM and REFflash
are the number of RAM and flash references.
The following equation is used to calculate the average e!ective memory access
time for a system without a cache:
Te! =REFRAM
REFtotal
TRAM miss +REFflash
REFtotal
Tflash miss (4.3)
With a handheld device, not only is performance important, but power consump-
tion is as well. The system performance of a device is the time required to perform a
task. With a handheld, power consumption defines the amount of time a device can
be used. Studies have also been done to show that adding a cache not only increases
performance but can reduce the battery consumption for portable devices [37].
4.2 Experimental Setup
Table 4.1 summarizes the four sessions used in our cache simulations. Events refers
to the number of entries in each of the activity logs. Real Time is the amount of time
(in HH:MM:SS) each session spans. Run Time is the period of time (in MM:SS) that
the processor was active (either running or dozing, but not sleeping). Ave Mem Cyc
is the average e!ective memory access time (in cycles), without a cache as computed
with Equation 4.3.
The volunteer user operated a Palm m515 device with 16 MB of RAM, 4 MB of
flash and a 33 MHz Dragonball MC68VZ328 processor (which does not have a cache).
34
Before each session, we initialized the device and collected the necessary information
for the simulator, as described in Chapter 2. While replaying a session with the
emulator, we collected a log of the memory requests. We used these logs, or traces,
with a cache simulator that we wrote to obtain cache miss rates. We simulated 56
di!erent cache configurations by varying the cache size, line size and associativity.
The LRU replacement policy was used in every configuration.
4.3 Cache Simulation Results
Figure 4.5 illustrates miss rates for the 56 cache configurations of one session.
These results are typical of the other sessions in Table 4.1. Caches with a line size
of 32 bytes performed better than those with 16 byte lines except for the largest
cache sizes simulated with 4 and 8 way set associativities. Furthermore, increasing
the associativity typically decreases the miss rate.
The Dragonball MC68VZ328 requires one cycle for RAM accesses and three cycles
for flash accesses [38]. With the flash contributing about two thirds of the total
references, the average e!ective memory access time is closer to that of a flash access
time than a RAM access time (see Table 4.1). We used Equation 4.2 with Thit equal
to one cycle and TRAM miss and Tflash miss equal to one and three cycles respectively to
compute the average e!ective memory access times. Figure 4.6 displays the results for
the same 56 cache configurations. In all configurations, adding a cache significantly
reduces the average memory access time.
The small cache sizes used in this study exhibit the same miss rate trends found in
larger caches used in desktop systems. Figure 4.7 shows the miss rates for a desktop
system, exemplifying these trends. This sample comes from the Trace Distribution
Center at Brigham Young University, a repository of desktop and server traces.
35
4.4 Conclusion
Average memory access times for current Palm OS workloads su!er from requests
to the flash dominating those to the RAM, which require three times as many cycles
to complete. Our cache simulations show that even relatively small caches can reduce
the e!ective memory access time by 50% or more! This is mostly due to the flash
memory receiving the majority of references. Adding a cache to a Palm OS device
using a Dragonball MC68VZ328 processor can greatly reduce the average e!ective
memory access time and potentially reduce the battery consumption.
36
Figure 4.5: Miss Rates For 56 Cache Configuration.
37
Figure 4.6: Average E!ective Memory Access Times.
38
Figure 4.7: Miss Rates For A Desktop Address Trace.
39
40
Chapter 5
Conclusion
Hardware prototypes are expensive to build. To minimize the overall development
cost of new hardware, system architects rely on simulation to guide development
choices. Trace-driven simulators provide more accurate performance estimates than
other simulation approaches.
We have designed a trace-driven simulator for Palm OS devices. First, the initial
state is collected from a handheld device, then the device is instrumented to record
external inputs with five di!erent hacks. The log of external inputs, or activity log,
is collected while an actual user is running real applications under normal operating
conditions. After transferring the activity log to a desktop system, we replayed it on
an equivalent system, instrumented to record opcode usage, memory references and
performance statistics. With this data, tests such as energy consumption and cache
simulations can be realistically and accurately performed. Furthermore, our cache
simulations show that adding even a small cache memory to a Palm OS device can
greatly reduce the average e!ective memory access time.
5.1 Future Work
While our simulator provides statistical performance metrics for the most com-
monly used features of Palm OS devices, it does not currently replay activity logs
41
that involve memory cards, IrDA or serial port activity, resets or battery informa-
tion. Each of these features have been left for future work and would allow a more
diverse group of sessions to be replayed with the simulator. Furthermore, a more
comprehensive study of usage behavior with more users over a longer time frame
would provide more accurate estimates for cache simulations and other such tests.
42
LIST OF REFERENCES
[1] J. C. Mogul and A. Borg, “The e!ect of context switches on cache performance,”
in Proceedings of 4th International Conference on Architect Support for Program-
ming Language and Operating Systems. ACM, 1991, pp. 75–84.
[2] A. Borg, R. E. Kessler, and D. W. Wall, “Generation and analysis of very long
address traces,” in Proceedings of 17th International Symposium on Computer
Architecture. ACM, 1990, pp. 270–279.
[3] O. A. Olukotun, T. N. Mudge, and R. B. Brown, “Implementing a cache for
a high-performance gaas microprocessor,” in Proceedings of 18th International
Symposium on Computer Architecture. ACM, 1991, pp. 138–147.
[4] D. A. Wood, M. D. Hill, and R. E. Kessler, “A model for estimating trace-sample
miss ratios,” in Proc. of 1991 ACM Sigmetrics. ACM, 1991, pp. 79–89.
[5] S. Przybylski, M. Horowitz, and J. Hennessy, “Characteristics of
performance"optimal multi-level cache hierarchies,” in Proceedings of 16th
Annual International Symposium on Computer Architecture. IEEE, 1989.
[6] J. K. Flanagan, B. E. Nelson, J. K. Archibald, and K. Grimsrud, “Incomplete
trace data and trace driven simulation,” in Proceedings of the International
Workshop on Modeling, Analysis and Simulation of Computer and Telecommu-
nication Systems MASCOTS. SCS, 1993, pp. 203–209.
43
[7] K. Grimsrud, J. K. Archibald, B. E. Nelson, and J. K. Flanagan, “Estimation of
simulation error due to trace inaccuracies.” IEEE Asilomar Conference, 1992.
[8] A. Borg, R. E. Kessler, G. Lazana, and D. W. Wall, “Long address traces from
risc machines: generation and analysis,” Research Report 89/14, Sept 1989.
[9] R. M. Fujimoto and W. C. Hare:, “On the accuracy of multiprocessor tracing
techniques,” Georgia Institute of Technology, Report GIT"CC"92"53, June
1993.
[10] T. Kort, R. Cozza, L. Tay, and K. Maita, “Palmsource and microsoft tie for 1q04
pda market lead,” Gartner Dataquest, Tech. Rep., April 2004.
[11] C. Bey, E. Freeman, G. Hillerson, J. Ostrem, R. Rodriguez, and
G. Wilson, “Palm OS Programmer’s Companion Volume I,” Available:
http://www.palmos.com/dev/support/docs/ [accessed July 2001], 1996-2001.
[12] ——, Palm OS Programmer’s Companion Volume I, PalmSource, 1996-2001.
[13] R. Gannamaraju and S. Chandra, “Palmist: A tool to log palm system activity,”
in Proceedings of IEEE 4th Annual Workshop on Workload Characterization
WWC4, Dec. 2001, pp. 111–119.
[14] C. Rose and K. Flanagan, “Complete instruction traces from incomplete address
traces (CITCAT),” Computer Architecture News, vol. 24, no. 5, pp. 1–8, Dec
1996.
[15] C. Rose, “CITCAT: Constructing Instruction Traces from Cache-filtered Address
Traces,” Master’s thesis, Brigham Young University, April 1999.
44
[16] J. Flinn and M. Satyanarayanan, “Powerscope: A tool for profiling the energy
usage of mobile applications,” in Workshop on Mobile Computing Systems and
Applications (WMCSA), Feb 1999, pp. 2–10.
[17] N. Vijaykrishnan, M. T. Kandemir, M. J. Irwin, H. S. Kim, and
W. Ye, “Energy-driven integrated hardware-software optimizations using
SimplePower,” in ISCA, 2000, pp. 95–106. [Online]. Available: cite-
seer.nj.nec.com/vijaykrishnan00energydriven.html
[18] W. Ye, N. Vijaykrishnan, M. T. Kandemir, and M. J. Irwin, “The
design and use of simplepower: a cycle-accurate energy estimation tool,”
in Design Automation Conference, 2000, pp. 340–345. [Online]. Available:
citeseer.nj.nec.com/ye00design.html
[19] S. Lee, A. Ermedahl, S. L. Min, and N. Chang, “An accurate instruction-level
energy consumption model for embedded RISC processors,” in LCTES/OM,
2001, pp. 1–10. [Online]. Available: citeseer.nj.nec.com/lee01accurate.html
[20] T. L. Cignetti, K. Komarov, and C. S. Ellis, “Energy estimation tools for the
palm,” in ACM MSWiM 2000: Modeling, Analysis and Simulation of Wireless
and Mobile Systems, Aug. 2000.
[21] PalmSource, Inc, Palm OS Emulator, Available:
http://www.palmos.com/dev/tools/emulator/ [accessed July 2001].
[22] Palm, Inc, “Palm support: Hotsync technology support index,” Available:
http://www.palm.com/us/support/hotsync.html [accessed 2003].
[23] C. Bey, E. Freeman, G. Hillerson, J. Ostrem, R. Rodriguez, and
G. Wilson, “Palm OS Programmer’s API Reference,” Available:
http://www.palmos.com/dev/support/docs/ [accessed July 2001], 1996-2001.
45
[24] E. Keyes, “The HackMaster Protocol,” Available:
http://www.daggerware.com/hackapi.htm [accessed 2001].
[25] ——, “Hackmaster v0.9,” Available: http://www.daggerware.net/hackmstr.htm
[accessed 2001].
[26] “Tealmaster system extensions manager,” Available:
http://www.tealpoint.com/softmstr.htm [accessed 2003], TealPoint Software,
2004.
[27] LinkeSOFT, “X-master 1.5 free extension (hack) manager,” Available:
http://linkesoft.com/xmaster/ [accessed 2002].
[28] N. Bridges, “Quartus Forth Manual,” Available:
http://www.quartus.net/products/forth/manual/specific.htm [accessed January
2004], 1998-1999.
[29] The Gra"ti Anywhere Page, Available: http://www.escande.org/palm/GrfAnywhere/
[accessed 24 May 2004].
[30] PalmSource Inc, “Palm os, palm powered handhelds, and 18,000 software appli-
cations,” Available: http://www.palmsource.com [accessed July 2001].
[31] G. Hewgill, “Copilot: Design and development,” Handheld Systems, vol. 6, no. 3,
May/June 1998.
[32] GNU General Public License, Free Software Foundation,
http://www.gnu.org/copyleft/gpl.html.
[33] K. Rollin, “The Palm OS Emulator,” Handheld Systems, vol. 6, no. 3, May/June
1998.
46
[34] “The nosleep software open source project,” Available:
http://nosleepsoftware.sourceforge.net/download.html [accessed April 2004],
September 2002.
[35] “Palm desktop software for windows,” Available:
http://www.palmone.com/us/support/downloads/win desktop.html [accessed
April 2004], palmOne, Inc., 1995-2001.
[36] J. L. Hennessy and D. A. Patterson, Computure Architecture: A Quantitative
Approach, 3rd ed. Morgan Kaufmann Publishers, 2003.
[37] J. Su, “Cache optimization for energy e"cient memory hierarchies,” Master’s
thesis, Brigham Young University, 1995.
[38] MC68VZ328 Integrated Processor User’s Manual, Motorola, 02 2000.
47