[Lecture Notes in Computer Science] Multi-disciplinary Trends in Artificial Intelligence Volume 7694...

10
C. Sombattheera et al. (Eds.): MIWAI 2012, LNCS 7694, pp. 214–223, 2012. © Springer-Verlag Berlin Heidelberg 2012 Tuning the Optimization Parameter Set for Code Size N.A.B. Sankar Chebolu 1,2 , Rajeev Wankar 2 , and Raghavendra Rao Chillarige 2 1 ANURAG, Hyderabad, India 2 Department of Computer and Information Sciences, University of Hyderabad, Hyderabad, India [email protected], [email protected], [email protected] Abstract. Determining nearly optimal optimization options for modern day compilers is a combinatorial problem. Added to this, specific to a given application, platform and optimization objective, fine tuning the parameter set being used by various optimization passes, enhance the complexity further. In this paper we propose a greedy based iterative approach and investigate the impact of fine-tuning the parameter set on the code size. The effectiveness of our approach is demonstrated on some of benchmark programs from SPEC2006 benchmark suite that there is a significant impact of tuning the parameter values on the code size. 1 Introduction Modern compilers are equipped with wide variety of sophisticated optimizations which includes local optimizations, global optimizations, inter-procedural optimizations, feedback directed optimizations, link-time optimizations etc. These optimizations can also be classified as architecture dependent and architecture independent optimizations. The objectives of these optimizations are mainly the execution time, code size or power. The best optimization sequence will depend on the application, optimization objective and the target architecture. Tuning the compiler settings and thereby turning on or off various compiler optimization settings will result in maximal performance [1]. Sophisticated auto- tuning strategies for exploring the optimization sequences is considered as the one of the major sources of the unexploited performance improvements with existing compiler technology [2]. Optimization options with the recent compilers are plenty. Many of these optimization passes have hard-coded parameters set by the compiler writer, which may not produce the most optimal code. Due to the sheer number of optimizations available and the range of parameters that they could take, it becomes impossible to identify the best sequence by hand [3]. The search for suitable optimization sequences and optimization parameters that promise a positive effect on a single or multiple objective functions is not straightforward. Study shows that standard optimization levels result in poor performance [4, 5, 6, 7], and there is a necessity for more refined approaches. Current paper deals with the tuning of the parameter set. We have considered the code size as our optimization objective as it plays a vital role in the embedded systems design. Many traditional compiler optimizations are designed to reduce the

Transcript of [Lecture Notes in Computer Science] Multi-disciplinary Trends in Artificial Intelligence Volume 7694...

C. Sombattheera et al. (Eds.): MIWAI 2012, LNCS 7694, pp. 214–223, 2012. © Springer-Verlag Berlin Heidelberg 2012

Tuning the Optimization Parameter Set for Code Size

N.A.B. Sankar Chebolu1,2, Rajeev Wankar2, and Raghavendra Rao Chillarige 2

1 ANURAG, Hyderabad, India 2 Department of Computer and Information Sciences, University of Hyderabad,

Hyderabad, India [email protected], [email protected],

[email protected]

Abstract. Determining nearly optimal optimization options for modern day compilers is a combinatorial problem. Added to this, specific to a given application, platform and optimization objective, fine tuning the parameter set being used by various optimization passes, enhance the complexity further. In this paper we propose a greedy based iterative approach and investigate the impact of fine-tuning the parameter set on the code size. The effectiveness of our approach is demonstrated on some of benchmark programs from SPEC2006 benchmark suite that there is a significant impact of tuning the parameter values on the code size.

1 Introduction

Modern compilers are equipped with wide variety of sophisticated optimizations which includes local optimizations, global optimizations, inter-procedural optimizations, feedback directed optimizations, link-time optimizations etc. These optimizations can also be classified as architecture dependent and architecture independent optimizations. The objectives of these optimizations are mainly the execution time, code size or power. The best optimization sequence will depend on the application, optimization objective and the target architecture.

Tuning the compiler settings and thereby turning on or off various compiler optimization settings will result in maximal performance [1]. Sophisticated auto-tuning strategies for exploring the optimization sequences is considered as the one of the major sources of the unexploited performance improvements with existing compiler technology [2]. Optimization options with the recent compilers are plenty. Many of these optimization passes have hard-coded parameters set by the compiler writer, which may not produce the most optimal code. Due to the sheer number of optimizations available and the range of parameters that they could take, it becomes impossible to identify the best sequence by hand [3]. The search for suitable optimization sequences and optimization parameters that promise a positive effect on a single or multiple objective functions is not straightforward. Study shows that standard optimization levels result in poor performance [4, 5, 6, 7], and there is a necessity for more refined approaches.

Current paper deals with the tuning of the parameter set. We have considered the code size as our optimization objective as it plays a vital role in the embedded systems design. Many traditional compiler optimizations are designed to reduce the

Tuning the Optimization Parameter Set for Code Size 215

execution time of compiled code, but not necessarily the size of the code [8, 9]. The compilers for embedded systems should use the best sequence of optimizations and parameter set to minimize code space.

The rest of the paper is structured as following. Section 2, explores the compiler optimization space. Section 3 illustrates the experimental setup and Section 4 describes the strategy followed to fine tune the parameter set, Section 5 discusses the effectiveness of the tuning. Finally in section 6, the results are summarized and a case is made for future research in this area.

2 Compiler Optimization Space Exploration

As discussed, modern compilers provide a vast number of optimizations with complex mutual interactions and effect different objective functions, such as execution time, code size or power in a hardly predictable manner. For our study we consider the widely known and used open-source GCC compiler version 4.5.3 and its optimization space. This compiler does many optimization transformations in more than 200 passes on both GIMPLE and RTL. Some of these passes are architecture specific, some of the passes are especially related to constant propagation and dead code elimination etc., are called multiple times.

Optimizations offered by this compiler are mainly of two types: those that apply to all architectures, controlled by -f options and those that are target-specific, controlled by -m options. There are around 166 optimization options of -f type supported by the GCC 4.5.3 compiler[10], which can be made on or off by the user. The Parameter set to control these optimizations is 120, each with a specified range of values. GCC implements the notion of optimization levels, which are umbrella options that automatically enable individual transformations. These optimization levels include O0 to O3 and Os which will enable or disable certain individual optimization options. -O0 is the default level which is meant for less compile time and better debug information. -O1 to -O3 levels will gradually increase their stress on execution time at the cost of increasing compilation time, increasing code size and decreasing debugging information. The main objective of the optimization level -Os is to reduce the code size. The following table shows the number of optimization options enabled or disabled with different optimization levels. However, there is no guarantee that all these optimization levels will perform well on different architectures for various applications.

Table 1. List of enabled and disabled optimizations for different optimization levels

Optimization level No of Enabled Optimizations

No of Disabled Optimizations

-O0 46 120-O1 68 98-O2 92 74-O3 98 68-Os 93 73

216 N.A.B. Sankar Chebolu, R. Wankar, and Raghavendra Rao C.

Not all the options are enabled even with -O3 level, mainly because either the specific transformation is still relatively immature or because it only benefits few programs [11].

Apart from these optimization options, GCC uses various constants to control the amount of optimization done. For example, GCC limits the size of functions that can be inlined through the parameters called `max-inline-insns-single' and `max-inline-insns-auto'. The optimization levels do not change any of these parameters and they will be kept to their default values. However, GCC provides an option of the form ‘–param name=value', which can be used to change these parameters explicitly. The 'name' refers to the parameter name and the 'value' is the allowed value from its range. These parameters will be used by the compilers optimization algorithms during code generation.

The literature survey shows many attempts by various researchers to tune the optimization options using various approaches including the statistical tuning [1], genetic algorithms based [7] and also machine learning based [6]. It is intuitive that the parameter set also plays an important role in achieving better performance. Fine tuning of the parameter set is not considered by earlier researchers. In our experimentation we study the impact of the parameter set on the specific objective of code size and see whether the impact of parameter tuning is significant or not.

3 Experimental Setup

3.1 Testing Platform

Intel Xeon E540 based 4 core system, each core operating at 2.66GHZ with 6MB of L1 cache and with Fedora release 14 (Laughlin) having Linux Kernel 2.6.35 is used for the experimentation. The test cases were selected from CINT2006 of SPEC 2006[12]. These are compute-intensive benchmark programs written in 'C' and detailed in the following table.

Table 2. Lists the Experimentation programs with short description

Program Descriptionspecrand Random Number Generationhmmer Search Gene Sequencesjeng Artificial Intelligence: Chess

libquantum Physics: Quantum Computing

3.2 Parameter Set

The parameter set of GCC 4.5.3 includes around 120 parameter names each with the specific default value fixed by the compiler writers. The minimum value and sometimes the allowed maximum value are also specified so that user can fine-tune these values within the allowed ranges, using the 'param' option. Some of the 'name' values can be directly fixed based on the features/properties of the host processor

Tuning the Optimization Parameter Set for Code Size 217

Table 3. List of all param <name,value> pairs considered

S.No Param S.No Param

1 struct-reorg-cold-struct-ratio 49 max-iterations-to-track 2 predictable-branch-outcome 50 hot-bb-count-fraction 3 max-crossjump-edges 51 hot-bb-frequency-fraction 4 min-crossjump-insns 52 max-predicted-iterations 5 max-grow-copy-bb-insns 53 align-threshold 6 max-goto-duplication-insns 54 align-loop-iterations 7 max-delay-slot-insn-search 55 tracer-dynamic-coverage 8 max-delay-slot-live-search 56 tracer-dynamic-coverage-feedback 9 max-gcse-memory 57 tracer-max-code-growth

10 max-pending-list-length 58 tracer-min-branch-ratio 11 max-inline-insns-single 59 max-cse-path-length 12 max-inline-insns-auto 60 max-cse-insns 13 large-function-insns 61 max-reload-search-insns 14 large-function-growth 62 max-cselib-memory-locations 15 large-unit-insns 63 max-sched-ready-insns 16 inline-unit-growth 64 max-sched-region-blocks 17 ipcp-unit-growth 65 max-pipeline-region-blocks 18 max-inline-insns-recursive 66 max-sched-region-insns 19 max-inline-insns-recursive-auto 67 max-pipeline-region-insns 20 max-inline-recursive-depth 68 min-spec-prob 21 max-inline-recursive-depth-auto 69 max-sched-extend-regions-iters 22 min-inline-recursive-probability 70 max-sched-insn-conflict-delay 23 early-inlining-insns 71 sched-spec-prob-cutoff 24 max-early-inliner-iterations 72 sched-mem-true-dep-cost 25 min-vect-loop-bound 73 selsched-max-lookahead 26 max-unrolled-insns 74 selsched-max-sched-times 27 max-average-unrolled-insns 75 max-last-value-rtl 28 max-unroll-times 76 integer-share-limit 29 max-peeled-insns 77 min-virtual-mappings 30 max-peel-times 78 virtual-mappings-ratio 31 max-completely-peeled-insns 79 max-jump-thread-duplication-stmts 32 max-completely-peel-times 80 max-fields-for-field-sensitive

33 max-completely-peel-loop-

nest-depth 81 prefetch-latency

34 max-unswitch-insns 82 simultaneous-prefetches 35 max-unswitch-level 83 min-insn-to-prefetch-ratio 36 lim-expensive 84 prefetch-min-insn-to-mem-ratio 37 iv-consider-all-candidates-bound 85 switch-conversion-max-branch-ratio 38 iv-max-considered-uses 86 sccvn-max-scc-size 39 iv-always-prune-cand-set-bound 87 ira-max-loops-num 40 scev-max-expr-size 88 ira-max-conflict-table-size 41 omega-max-vars 89 ira-loop-reserved-regs

218 N.A.B. Sankar Chebolu, R. Wankar, and Raghavendra Rao C.

Table 3. (continued)

42 omega-max-geqs 90 loop-invariant-max-bbs-in-loop 43 omega-max-eqs 91 max-vartrack-size 44 omega-max-wild-cards 92 ipa-sra-ptr-growth-factor 45 omega-hash-table-size 93 graphite-max-nb-scop-params 46 omega-max-keys 94 graphite-max-bbs-per-function

47 vect-max-version-for-

alignment-checks 95 loop-block-tile-size

48 vect-max-version-for-alias-

checks

system and its features. For example, names like 'L1-cache-size', 'L1-cache-line-size', 'L2-cache-size' were fixed to their actual values and removed from our study. Values for certain 'names' are binary in nature and the rest are with many possible alternatives. There are around 12 'names' which are supported by the GCC 4.5.3 but documentation for these 'names' is not present in the gccdoc. These include ‘max-variable-expansions-in-unroller’, ‘gcse-after-reload-partial-fraction’, ‘gcse-after-reload-critical-fraction’, ‘max-once-peeled-insns’, ‘max-iterations-computation-cost’, ‘sms-max-ii-factor’, ‘sms-dfa-history’, ‘sms-loop-average-count-threshold’, ‘tracer-min-branch-probability-feedback’, ‘tracer-min-branch-probability’, ‘selsched-insns-to-rename’, ‘slp-max-insns-in-bb’.

Thus these 'names' were not considered in our experimentation due to the lack of enough documentation and they were fixed to their default values. Parameters meant to decrease the compilation time like 'ggc-min-expand', 'ggc-min-heapsize' were retained to their default values. Other parameters which are not relevant with respect to the objective of the optimization for example 'ssp-buffer-size' which is meant to support the stack smashing attack etc., are not considered for our study. Thus the total list of 95 parameters considered for the study is presented in the following table.

4 Fine-Tuning Strategy

The strategy employed to fine-tune the parameter set is in the lines of Greedy based Iterative compilation. The crux of iterative compilation approach is to explore the optimization space iteratively by computing and measuring the effectiveness of optimization sequences. Optimization sequence includes setting of right value for each parameter of the parameter set (based on the strategy adopted) from its respective allowed range. As the objective under consideration is code space, the effectiveness of the optimization sequence is measured by code size of the executable, which is obtained after compiling with the selected optimization sequence. The code size is the sum of text and data section sizes and these were obtained using the size command.

For each test case the experimentation is repeated for fixed number of Iterations. In each iteration the value of every individual parameter is tuned within its allowed range by checking its effectiveness. In the process, the best value of the individual

Tuning the Optimization Parameter Set for Code Size 219

parameter is obtained. While tuning this parameter, rest of the parameters from the complete set is kept to the current default values. The set of all best values obtained for the entire parameter set will be the default value set for the next iteration. During the first iteration the default values provided by the compiler is considered as the default values. During each iteration the best values are compared with its current default values and their absolute and real difference are measured. The sum of these differences for each iteration were calculated and referred as iteration values. The series of all these iteration values indicates the level of convergence. The parameters set corresponding to the convergence point is considered as best parameter value set.

Algorithm: Greedy based Iterative fine-tuning algorithm Input: Benchmark Programs and GCC Compiler with its Optimization Parameter Set Output: Optimum values for the entire parameter set

Begin Repeat fixedNumberOfIterations { For each Iteration { If iteration == 0 then defaultParamSet[]=compilerDefaultParamSet[] else defaultParamSet[] = preIterationBestParamSet[] for each parameter in the paramSet Obtain best value of param[i] from its range preIterationBestParamSet[i] = param[i] } Impact = codesize(defaultParamSet[]) - codesize ( preIterationBestParamSet[])

} Analyze Impact values and obtain best ParameterSet values end;

5 Analysis of the Results

Following figures illustrate the impact of the tuning parameter set with respect to the size objective.

220 N.A.B. Sankar Chebolu, R. Wankar, and Raghavendra Rao C.

Fig. 1. Sum of the absolute and real differences between best and default normalized parameter values at each iteration for Specrand program

Fig. 2. Sum of the absolute and real differences between best and default normalized parameter values at each iteration for hmmer program

Tuning the Optimization Parameter Set for Code Size 221

Fig. 3. Sum of the absolute and real differences between best and default normalized parameter values at each iteration for Libquantum program

Fig. 4. Sum of the absolute and real differences between best and default normalized parameter values at each iteration for Sjeng program

222 N.A.B. Sankar Chebolu, R. Wankar, and Raghavendra Rao C.

Table 4. Code Size of benchmark programs (in terms of number of bytes) at various standard Optimization levels with default and fine-tuned parameter set values

Benchmark Program

Standard Optimization Level

With Default Parameter Set

With Fine-tuned Parameter Set

Specrand -O0 5486 5486 -O1 5416 5416 -O2 5448 5448 -O3 5448 5448 -Os 5302 5302

hmmer -O0 335224 335224 -O1 264067 259971 -O2 272775 260133 -O3 304679 280393 -Os 222963 223887

Libquantum -O0 45490 45490 -O1 38430 38414 -O2 40946 38414 -O3 45042 43870 -Os 33584 33632

Sjeng -O0 166004 166004 -O1 140496 140080 -O2 143952 135792 -O3 157487 151759 -Os 116958 117344

Observations: It is observed that the parameter set is getting converged and fine-

tuned to optimum level. Interestingly for the program specrand, there is absolutely no difference in the code size, irrespective of change in the parameter set values. For the program hmmer parameter set got tuned after the 11 iterations and similarly the programs libquantam and sjeng obtained their best parameter sets at 9th and 13th iterations respectively. Table 4 provides the code sizes of these benchmark programs with default and fine-tuned parameter set values against standard optimization switches. It is observed that fine tuned parameter set results in lesser code sizes compared to the default parameter set values for most of the test cases, especially when the standard optimization level is –O0 to –O3. However when the standard optimization level is –Os, the default parameter set is slightly fairing well than the fine-tuned parameter set, and the reasons for the same needs to be further investigated. It is also evident from the experimentation data that with respect to the code-size objective, only 14 out of total 95 parameters have a significant impact on the output. The list of these 14 parameters are: 'predictable-branch-outcome', ' max-inline-insns-auto', ' early-inlining-insns', 'max-iterations-to-track', 'hot-bb-frequency-fraction', 'align-threshold', 'align-loop-iterations', 'max-predicted-iterations', 'min-crossjump-insns', 'max-grow-copy-bb-insns', 'max-cse-path-length',

Tuning the Optimization Parameter Set for Code Size 223

'im-expensive', 'iv-consider-all-candidates-bound', 'max-jump-thread-duplication-stmts'. It is also observed that the result of size related experimentation is exactly the same even if the default optimization is changed from –O2 to –Os.

Further to this, a statistical analysis based on the Analysis of Variance (ANOVA) is carried out on the code size values of all four selected benchmark programs, at all standard optimization levels with default parameter set and also with the best values obtained through this fine-tuning exercise. Idea is to check whether the impact of this fine-tuned parameters set is significant or not. It is evident with the results of this statistical analysis that tuning the parameter set will play significant role.

6 Summary and Future Work

This study brings out fact that fine-tuning the parameter set apart from the optimization options is necessary to obtain the best results. It is also evident from the study that the 14 parameters out of 95 parameters under study play a significant role with respect to the code size. This fine-tuning strategy can be applied to the execution time objective and also the interdependencies between these two objectives can be studied. The fine-tuning strategies can further be researched and applied to achieve the global optimum.

References

1. Haneda, M., Knijnenburg, P.M.W., Wijshoff, H.A.G.: Automatic Selection of Compiler Options using Non-Parametric Inferential statistics. In: 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005) (2005)

2. Adve, V.: The Next Generation of Compilers. In: Proc. of CGO (2009) 3. Duranton, M., Black-Schaffer, D., Yehia, S., De Bosschere, K.: Computing Systems:

Research Challenges Ahead The HiPEAC Vision 2011/2012 4. Kulkarni, P.A., Hines, S.R., Whalley, D.B., et al.: Fast and Efficient Searches for Effective

Optimization-phase Sequences. Transactions on Architecture and Code Optimization (2005)

5. Leather, H., O’Boyle, M., Worton, B.: Raced Profiles: Efficient Selection of Competing Compiler Optimizations. In: Proc. of LCTES (2009)

6. Agakov, F., Bonilla, E., Cavazos, J., et al.: Using Machine Learning to Focus Iterative Optimization. In: Proc. of CGO (2006)

7. Cooper, K.D., Schielke, P.J., Subramanian, D.: Optimizing for Reduced Code Space using Genetic Algorithms. SIGPLAN Not. 34(7) (1999)

8. Khedkar, U., Govindrajan, R.: Compiler Analysis and Optimizations: What is New? In: Proc. of Hipc (2003)

9. Beszédes, Á., Gergely, T., Gyimóthy, T., Lóki, G., Vidács, L.: Optimizing for Space: Measurements and Possibilities for Improvement. In: Proc. of GCC Developers Summit (2003)

10. GCC, the GNU Compiler Collection - online documentation, http://gcc.gnu.org/ onlinedocs/

11. Novillo, D.: Performance Tuning with GCC. Red Hat Magazine (September 2005) 12. SPEC-Standard Performance Evaluation Corporation,

http://www.spec.org/cpu2006