1 Two Case Studies in Predictable Application Scheduling Using Rialto/NT Michael B. Jones –...
Transcript of 1 Two Case Studies in Predictable Application Scheduling Using Rialto/NT Michael B. Jones –...
1
Two Case Studies in Two Case Studies in Predictable Application Predictable Application
Scheduling Using Rialto/NTScheduling Using Rialto/NT
Michael B. Jones – Microsoft Research
John Regehr – University of Virginia
Stefan Saroiu – University of Washington
2
Application Case StudiesApplication Case Studies
Two applications needing predictable execution on Windows 2000 Soft Modem Driver Digital Audio Player
The case studies analyze behavior on normal Windows 2000 study improvements possible using
Rialto/NT CPU Reservation mechanism
3
Consumer Real-TimeConsumer Real-Time General-purpose Operating Systems,
such as Windows 2000: maximize aggregate throughput approximate fair sharing of the resources
Increasing use of time-dependent tasks signal processing, audio, video
Need support for: predictable scheduling for independently
developed applications low latency responses explicit resource allocation mechanisms
4
Rialto/NT AbstractionsRialto/NT Abstractions
Two real-time software abstractions: CPU Reservations – ongoing reservation for
at least X time units out of every Y units for a thread
Time Constraints – one-shot time reservation for specified amount of work between start time and deadline
Case studies use only CPU Reservations
5
Rialto/NT ImplementationRialto/NT Implementation
Rialto/NT developed on top of Windows 2000 priority scheduler
Limitations: CPU Reservations must be integer
multiples of milliseconds Frequency of reservations must be
power-of-two multiple of 1ms
7
Why Study Soft Modems ?Why Study Soft Modems ?
Signal Processing done on host CPU: requires predictable scheduling requires low latency responses
While coexisting with other system activities Soft Modem is a background real-time task
Successful in home computer market: Low cost Easy to update – software upgrade
8
MethodologyMethodology Instrumented Windows 2000 performance kernel:
Logs predefined and custom events Writes them to a memory buffer Dumps buffers to disk at end of trace
Driver Software: No source for signal processing code
Measurement Environment: All experiments run with normal-priority spinning
competitor thread System:
Windows 2000 Professional Pentium II 450 MHz (uniprocessor) 384 MB ECC SDRAM - 100 MB allocated to logging
9
Vendor Driver Version - Vendor Driver Version - Processing in Interrupt (INT)Processing in Interrupt (INT)
Operation of the modem: 1. DMA transfers between A/D and D/A and
physical memory 2. When enough data samples, the modem
raises an interrupt 3. Inside ISR, process incoming data and
provide outgoing samples, before buffers exhausted
Uses input and output data buffers holding 512 16-bit samples (1024 bytes/buffer)
10
Three Additional VersionsThree Additional Versions
DPC Version (DPC) The ISR queues a DPC DPC performs signal processing
Thread Version (THR) The ISR queues a DPC that signals a thread via a
semaphore Thread performs signal processing Experimented with several different priorities
Rialto/NT Version (RES) Same as THR, but thread scheduled using
Rialto/NT real-time periodic CPU Reservation
11
Interrupt RateInterrupt Rate3 different phases, interrupts very
regularRate of Interrupts (INT)
0
5
10
15
20
25
30
35
0 5 10 15 20 25 30
Time (seconds)
Mil
lise
con
ds
On-hook ConnectedTrainingDialing
Falls within PC 99 recommended interrupt rates of 3-16ms
12
Elapsed Times in ISR (INT)Elapsed Times in ISR (INT)
PC 99 recommends maximum time during which a driver-based modem disables interrupts should not exceed 100 µs
1.8 ms with repeatable worst case of 3.3 ms
Elapsed Times in Interrupt Handler (INT)
0
0.5
1
1.5
2
2.5
3
3.5
0 5 10 15 20 25 30
Time (seconds)
Mil
lis
ec
on
ds
On-hook ConnectedTrainingDialing
13
CPU UtilizationCPU Utilization14.7% sustained load on 450MHz Pentium
IICPU Load
0%
5%
10%
15%
20%
25%
30%
35%
0 5 10 15 20 25 30
Time (seconds)
CP
U L
oad
On-hook ConnectedTrainingDialing
14
Elapsed Times in ISR (DPC)Elapsed Times in ISR (DPC)
ISR times now small, typically < 6µs
Elapsed Times In Interrupt Handler (DPC)
0
2
4
6
8
10
12
14
16
0 5 10 15 20 25 30
Time (seconds)
Mic
ros
ec
on
ds
On-hook ConnectedTrainingDialing
15
Elapsed Times in Queued DPCElapsed Times in Queued DPC
PC 99 recommends that the total execution time required for all queued DPCs should not exceed 500 µs
But now long DPC times: 1.8ms avg., 3.3 max (same as elapsed times in ISR for INT)
Elapsed Times In Queued DPC (DPC)
0
0.5
1
1.5
2
2.5
3
3.5
0 5 10 15 20 25 30
Time (seconds)
Mil
lis
ec
on
ds
On-hook ConnectedTrainingDialing
16
Samples Pending to be ProcessedSamples Pending to be Processed(INT & THR 24)(INT & THR 24)
Small relative to 512 sample buffer sizeSamples Pending to be Processed (INT)
0
5
10
15
20
25
30
35
0 5 10 15 20 25 30
Time (seconds)
Un
pro
ce
ss
ed
Sa
mp
les
On-hook ConnectedTrainingDialing
Samples Pending to be Processed (THR 24)
0
5
10
15
20
25
30
35
0 5 10 15 20 25 30
Time (seconds)
Un
pro
ce
ss
ed
Sa
mp
les
On-hook ConnectedTrainingDialing
17
Samples Pending to be Samples Pending to be Processed (THR 8)Processed (THR 8)
Unsurprisingly, contention kills modem
Samples Pending to be Processed (THR 8)
0
100
200
300
400
500
600
0 5 10 15 20 25 30 35
Time (seconds)
Un
pro
cess
ed S
amp
les
On-hook "Please hang up and try your call again"Dialing
18
Latency ResultsLatency Results
Set the multimedia timers to fire once every millisecond
Register a routine to be called every millisecond
Routine does very little work Stores cycle counter value and sleeps again
Histograms show differences between recorded times and ideal times
19
Coexisting Thread Latencies Coexisting Thread Latencies (Control Case - No Modem)(Control Case - No Modem)
Maximum 1978µs between wakeupsControl Case - No Modem
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
Latency (microseconds)
Pe
rce
nta
ge
of
Ca
llb
ac
ks 96.8%
20
Coexisting Thread Latencies Coexisting Thread Latencies (INT)(INT)
Maximum 5313µs between wakeupsINT Version
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
Latency (microseconds)
Pe
rce
nta
ge
of
Ca
llb
ac
ks 83.1%
21
Coexisting Thread Latencies Coexisting Thread Latencies (DPC)(DPC)
Maximum 4396µs between wakeupsDPC Version
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
Latency (microseconds)
Pe
rce
nta
ge
of
Ca
llb
ac
ks 82.6%
22
Coexisting Thread Latencies Coexisting Thread Latencies (THR 24)(THR 24)
Maximum 2239µs between wakeupsTHR Version (24)
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
Latency (microseconds)
Pe
rce
nta
ge
of
Ca
llb
ac
ks 93.8%
23
What Have We Learned So Far?What Have We Learned So Far? Signal processing in the context of the
interrupt handler is: unnecessary detrimental to the latencies and predictability of
coexisting activities
Vendor choice understandable For any priority there is a potentially unbounded
delay between the interrupt and the thread running
In practice Delays are reasonable for well-configured systems
[Intel OSDI ’99] Using interrupts extreme form of priority inflation
24
Two Possible SolutionsTwo Possible Solutions Rate Monotonic Analysis – determine the
“right” priority assignments among all threads - two problems: Assumes cooperative priority assignment among all
threads - unrealistic Working priority assignment dependent upon
timing requirements of all threads Changes in application mix may require changes
in priority assignments
Use a time-based real-time scheduler Such as Rialto/NT
25
Samples Pending to be Processed Samples Pending to be Processed (RES 2ms/8ms – 25%)(RES 2ms/8ms – 25%)
Fits well within 512-sample buffer sizeSamples Pending to be Processed (RES 2ms/8ms)
0
20
40
60
80
100
120
140
160
0 5 10 15 20 25 30 35
Time (seconds)
Un
pro
ce
ss
ed
Sa
mp
les
On-hook ConnectedTrainingDialing
26
Coexisting Thread Latencies Coexisting Thread Latencies (RES 2ms/8ms – 25%)(RES 2ms/8ms – 25%)Maximum 1971µs between
wakeupsRES Version (2ms/8ms)
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
7.0%
Latency (microseconds)
Pe
rce
nta
ge
of
Ca
llb
ac
ks
85.5%
27
File Transfer TimesFile Transfer Times
Min Max Mean Std Dev Passed
INT 36.334 36.398 36.367 0.029 10DPC 36.272 36.447 36.396 0.048 10THR Pri 24 36.319 36.475 36.384 0.056 10RES 1ms/7ms 36.333 36.724 36.426 0.112 10RES 2ms/13ms 36.288 36.975 36.547 0.232 10RES 2ms/14ms 38.631 91.713 65.172 37.535 2RES 3ms/15ms 36.275 36.586 36.387 0.108 10RES 3ms/16ms 97.289 180.415 110.523 26.408 9RES 4ms/16ms 36.255 37.116 36.415 0.256 10RES 8ms/20ms 36.347 36.476 36.394 0.039 10
Results for 10 copies of 200,000 bytes each
For 1/8, 2/15, 3/17, 4/17, 7/20 no test passed
28
Modem Reservation RangesModem Reservation RangesSensitivity to both percentage and gaps
If period < 12.5ms, must get 14.7% to workIf period > 12.5ms, (period – amount) >=
12.5ms must also hold
Modem Reservation Operating Ranges
0
1
2
3
4
5
6
7
8
9
10
0 2 4 6 8 10 12 14 16 18 20 22Reservation Period (ms)
Re
se
rva
tio
n A
mo
un
t (m
s)
Sufficient MarginalInsufficient Actual14.7% of CPU 12.5ms Gaps
SufficientCPU Percentageand Frequency
GapsToo
Long
Insufficient Percentage
29
Soft Modem ConclusionsSoft Modem Conclusions Signal Processing in interrupt context is:
Unnecessary Detrimental to the predictability and latencies of the
coexisting activities The DPC version has similar problems Threads help alleviate these problems
Modem runs well with real-time priorities and non-real-time competition
However modem threads may interfere with other threads
Real-time scheduler allows Control over modem’s degree of interference with other
time-sensitive activities Performance isolation for threads using reservations
30
Industry PerspectiveIndustry Perspective Vendor did try their own THR version
Worked fine during normal load However, modem was starved when:
Copying data between two IDE devices Using USB scanner (Intel 440BX chipset) that
turned off interrupts for 30-50 ms Therefore they shipped the INT version
Vendor is willing to be a “good citizen” only if ensured that others would be as well
Systematic latency timing verification of components is needed to enforce good behavior
31
Soft DSL is ComingSoft DSL is Coming
More demanding than soft modems 4ms processing period
G.lite 1.531Mbps downstream and 512Kbps upstream ~ 25% of a 600 MHz Pentium III
Full rate DSL 3.062Mbps downstream and 512Kbps upstream Nearly 50% of a 600 MHz Pentium III
Soft Bluetooth period 312.5µs
32
Further Soft Modem StudiesFurther Soft Modem Studies
Software-based Digital Subscriber Line (SoftDSL) studies
Multiple Soft Modems within the same machine
Similar studies on multiprocessors
34
MethodologyMethodology Empirically reverse-engineer thread
requirements in a complex, legacy soft real-time application without use of source code
Assign CPU reservations to threads without modifying the application
Measure application behavior during contention
35
Windows Media PlayerWindows Media Player Default player for mp3, wav, avi, mpeg Experimental method
Modelled contention using spinning thread at various priorities
Gave CPU Reservations to media player threads
Played an mp3 song Listened for glitches Used instrumented kernel to detect buffer
under-runs
36
Media Player Thread Media Player Thread Structure (Simplified)Structure (Simplified)
Thread Period (ms) Priority
Kernel Mixer (*) 10 24
MP3 Decoder (*) 100 9
User Interface 45 8
Disk Reader 2000 8
(*) Received CPU Reservations in some experiments.
37
MP3 Playback w/o ContentionMP3 Playback w/o Contention
Kmixer thread (top) runs every 10ms MP3 decoder (4th line) runs every 100ms Works fine
38
Starvation Caused by Competing Starvation Caused by Competing Thread @ Priority 10Thread @ Priority 10
Media Player runs only when NT priority inversion avoidance logic kicks in
39
Media Player + ReservationMedia Player + Reservation
1ms every 16ms reserved for decoder thread Competing with priority 10 thread Works fine
40
Priority Inversion Caused by Priority Inversion Caused by Competing ThreadCompeting Thread
Competitor thread (priority 9) preempts MP3 decoder while holding Kmixer buffer lock
Kmixer misses next two time slots (x) Starves, causes audio glitch
Fix: raise decoder priority before grabbing lock
xx
41
Media Player DeadlockMedia Player Deadlock
Circular wait among Media Player threads Deadlock broken by a timeout Fix: file a bug report…
42
Media Player ResultsMedia Player Results Expected
In the presence of contention, the Windows priority scheduler allows real-time apps to starve
This can be fixed by giving real-time threads CPU Reservation
Unexpected Competitor thread changes sequencing,
exposes races in Media Player Hard to write correct programs with
many threads & mutexes Fixed using priority ceiling emulation
43
Implications of ResultsImplications of Results Periods of threads in complex legacy apps
can be reverse engineered Amounts are platform-dependent and are
harder Next step to store application requirements
and use middleware to automatically assign reservations No application support needed Potentially a way around the chicken/egg
problem of using reservations in a world of legacy OSs and applications
44
Possible ContinuedPossible ContinuedMedia ExperimentsMedia Experiments
Study software DVD player CPU intensive and time sensitive
45
Overall ConclusionsOverall Conclusions Status quo insufficient
Applications either inflate their priorities as did the soft modem driver
or are at the mercy of applications that may be run at higher priorities as is the case with the digital audio player
CPU Reservations solve this problem by allowing applications to reliably obtain the
time they need while allowing other applications to do the same
46
For More InformationFor More Information
See Mike Jones ([email protected]): http://research.microsoft.com/~mbj/
or John Regehr ([email protected]): http://www.cs.utah.edu/~regehr/
or Stefan Saroiu ([email protected]): http://www.cs.washington.edu/homes/tzoompy/
Related papers at Mike’s web site