On Workload in an SCA-based System, with Varying Component and Data Packet Sizes
Tore Ulversøy1
Jon Olavsson Neset2
1FFI1UNIK University Graduate Center1University of Oslo (UiO)2Norwegian University of Science and Technology (NTNU)
Outline
• Background and Problem Definition• Empirical Analysis• Analysis using Low-Complexity Analytical Models• Conclusions
Background
• The base code of one of the waveform applications used in the following originate from a member in, and the waveform application is also used for other activities in
• The Regular Task Group on SDR founded below the RTO-IST-080
RTO-IST-080 RTG-038 Software Defined Radio currently, the team consists of experts from government,
university and industry fromCA, DK, GE, HU, IT, NL, NO, SP, TU, US and: SDR Forum
headed by NL (Chairman: Hans Segers, TNO)
Background: Main Objectives of RTG-038
© IBM/Levono
© Rockwell Collins
… 011010 …
© Spectrum Signal Processing
• Share knowledge & experience of (multi)national SDR/SCA developments
• Report on possibilities of sharing waveforms and waveform components
• Investigations of portability and interoperability:
SCA-based implementation of STANAG 4285 waveform
demonstrate portability onto national SDR platforms
demonstrate interoperability between the different implementations
1.
2.
3.
Problem Definition and Problem Background
•SCA defines an environment that allows applications to be built as compositions of SW components (and devices)•SCA defines a distributed system, communication through CORBA for CORBA-capable processors•There is wide freedom as to how small components to split the application into: Many small components reuse of components becomes easier, but CPU overhead increases
Processor 1 Processor 2 Processor 3
C6 C7 C8 C9 C10
C1 C2 C3 C4 C5
Application/Component View:
Physical View:
•What are the CPU overhead effects of a fine structure (many components) relative to a course one (few components), and how can we predict this overhead?
C_tot
Analysis Approach:
CPU workload implied by a task or a group of tasks = the fraction of available processor cycles occupied over a time period
Increasing accuracy
Analytical model
Increasing clarity / simplicity
Concurrency model, e.g. Petri-net
Simulation model
Measurements on testbed
Measurements on actual system
Empirical Analysis
• Using OSSIE (Open Source SCA Implementation Embedded) [2] from VirginiaTech which uses omniORB [3]– Advantages: Low user-threshold, full source-code available,
Linux-based• Profiling and monitoring tools:
– OProfile [4]– SYSSTAT sar [5]
Experiment: Stanag 4285
Experiment: Synthetic waveform
OS Linux 2.6.9-34.EL Linux 2.6.9-34.EL OSSIE revision 0.6.0 0.6.2 Processor Pentium M 1.86GHz Pentium M 1.86GHz RAM 1,5 GByte 1,5 GByte Cache specifics L1i=first level instruction cache L1d=first level data cache L2=second level cache
L1i=32kB L1d=32kB L2=2MB
L1i=32kB L1d=32kB L2=2MB
Empirical Analysis, Simple Waveform Application
• Stanag 4285, TX part. Base code provided by Telefunken Racoms for RTO-IST-080 RTG-038
• Implemented as three different configurations, all performing the same processing functional work
• Non-SCA ‘c’ version as a reference
Data Sink
Stanag4285 TX
Float to fixed converter
Forwarder Forwarder Forwarder Forwarder
Data Source
FEC Encoder
Inter-leaver
Symbol Mapper & Scrambler
Symbol to I/Q & TX Filter
Data Sink
Data Source
FEC Encoder
Inter-leaver
Symbol Mapper & Scrambler
Symbol to I/Q & TX Filter
Float to fixed converter
Data Sink
2 comp.
7 comp.
11 comp.
Packet rate regulator
Empirical Analysis, Stanag 4285 TX: Results, User
0
5
10
15
20
25
30
2395 25600 64000
Symbol rate
Use
r C
PU
% Non-SCA
Single component+sink
6-components+sink
10-components + sink
WL measured by SYSSTAT sar (sar –u 40 5)
Empirical Analysis, Stanag 4285 TX: Results, User+System
0
5
10
15
20
25
30
35
40
45
2395 25600 64000
Symbol rate
Use
r +
Sys
tem
CP
U %
Non-SCA
Single component+sink
6-components+sink
10-components + sink
WL measured by SYSSTAT sar (sar –u 40 5)
Empirical Analysis, Synthetic Application
• A total of 9 FIR-filters, N taps and packet size B (NxB mult/adds per FIR)
• Both N and B can be varied
• 4 different configurations
• ‘c’ version as a reference
W2:
W3:
W5:
W11:
SRC F1TO9 SNK
FTOT SNK
SRC F123 F456 F789 SNK
F5 F6 F7 F8 F9
SRC F1 F2 F3 F4
SNK
Packet rate regulator
Synthetic Application: WL versus Configuration and N
WL results measured by SYSSTAT sar (sar –u 40 3)
CPU Workload versus ConfigurationB=2000, PR=40
0
5
10
15
20
25
30
35
40
45
50
FUNC N=10
W2 N=10
W3 N=10
W5 N=10
W11 N=10
FUNC N=50
W2 N=50
W3 N=50
W5 N=50
W11 N=50
Configuration
CP
U W
L [
%]
System
UserRatio, user: 1,67Ratio, user+syst: 1,94
Ratio, user: 1,10Ratio, user+syst: 1,17
Synthetic Application: WL versus Packet Size
CPU Workload versus Packet Size
0
5
10
15
20
25
30
35
0 10000 20000 30000 40000 50000 60000 70000
Packet Size
CP
U W
L [
%]
FUNC U
FUNC U+S
W3 U
W3 U+S
W5 U
W5 U+S
W11 U
W11 U+S
W2 U
W2 U+S
•Packet rate: 10/sec
•N and B selected such that ‘C’ implementation (FUNC) is at 10±0,3% user CPU WL
•WL overhead is seen to increase significantly with B
The Simple Lower Bound Model (SLBM)
• Ideal, unrealistically optimistic model
• Serves as a lower bound
• ti = number of cycles per packet
100
)2()1()1(9%
PCR
tMtMtMtttPRWLi TFTSpacketSNKSRCCL
CN CN+1
TStTFt
packettCLt CLt
tSRC
tSNK tpacket tTS tCL tTF tpacket
tTS tpacket tCLtTF tTS
Idle
On timer strobe
Parameters in the Simple Lower Bound Model
CORBATSTSRC CORBATSTSINK
•For simplicity, we measure the parameters in the model with OProfile and/or SYSSTAT sar, using test applications:
for (i=0; i < BLSZ; i++)
......
‘c’-prog
SRCt SNKt
TSt(B) TFt
(B) packett(B)
sar estimate Assumed 0
Assumed 0
14,5*B 10,6*B 79200+5,5*B User
OProfile Estimate
650 200 14,6*B 10,6*B 80700+5,5*B
sar estimate Assumed 0
Assumed 0
≈ 0 ≈ 0 160000+23,6*B System
OProfile Estimate
≈ 0 ≈ 0 ≈ 0 ≈ 0 150000+23,0*B
CORBA test application:
Results, SLBM
• The simple model describes the dominating part of the user CPU overhead. Agreement best for small packet sizes
0 1 2 3 4 5 6
x 104
10
12
14
16
18
20
22
24
B
WL
Use
r [%
]
Comparison of W11 Measured Data and Simple Lower Bound Model
Packet SizeN and B selected such that ‘C’ implementation (FUNC) is at 10±0,3% user CPU WL
Measured
SLBM
M=11, Packet rate =10/sec
The Context Switch Model (CSM)
100%%
PCR
ttCSRWLiWLcs CSICSD
Context Switch Rate [switches/second]
CS Direct Cost [cycles]
Cycle rate of the processor
CS Indirect Cost [cycles]Here: Using course estimate based on addressed space and memory speed
Here: ≈ 5µsec = 9300 cycles
Measured ≈ 1300 for example next page
Results, CSM
• With the CS model, we better explain the measured WL
0 1 2 3 4 5 6
x 104
10
12
14
16
18
20
22
24
B
WL
Use
r [%
]
Comparison of W11 Measured Data and CS Model with Approximate Parameters
Measured
CSM, only tCSD, course estimate
CSM example (course parameter estimates)
M=11, Packet rate =10/sec
N and B selected such that ‘C’ implementation (FUNC) is at 10±0,3% user CPU WL
Packet Size
Conclusions:
• We have used empirical analysis and simple analytical models to understand the effects of granularity in an SCA-based system w/CORBA capable processors
• When executing the same total functional processing work, we observe that the processor workload increases as the number of components increases
• This overhead increases with data packet size, and becomes more dominant the lesser the functional work per packet
• Contributors: Data conversions, packet communication through CORBA, direct cost of context switches, indirect cost of context switches
• Hence the scalability and reusability benefits that result from implementing the SDR-application with a high number of components, must be balanced against the processing efficiency loss that occurs when having to run several components on the same processor
• Two simple models are described that help explain the major effects, and may be used to calculate the overhead
References:
[1] P. J. Fortier and H. E. Michel, Computer Systems Performance Evaluation and Prediction. Amsterdam: Digital Press, 2003.
[2] VirginiaTech, OSSIE development site for software-defined radio, http://ossie.wireless.vt.edu/trac as of Dec. 20 2007
[3] omniORB, http://omniorb.sourceforge.net/ as of Feb. 29 2008
[4] OProfile - A System Profiler for Linux, http://oprofile.sourceforge.net as of Feb. 29 2008
[5] SYSSTAT, http://pagesperso-orange.fr/sebastien.godard/ as of Feb. 29 2008
[6] Chuanpeng Li, Chen Ding, and Kai Shen, "Quantifying The Cost of Context Switch," in ExpCS '07: Proceedings of the 2007 workshop on Experimental computer science, San Diego, CA, 13-14 June 2007.
Questions?
Top Related