Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your...

23
Applying XL C Compiler Optimization on AIX: Optimization Level 2

Transcript of Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your...

Page 1: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

Applying XL C Compiler Optimization on AIX: Optimization Level 2

Page 2: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

1

Contents Introduction

Part 1 – Introducing optimization

Part 2 – Optimization level 2

Part 3 – Optimizing C code at level 2 using compiler options

Part 4 – Application of level 2 optimization

Summary

Page 3: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

2

Introduction This is the first of a series of tutorials that introduces the optimization features of the XL C/C++ compiler on AIX. These tutorials are intended to be short and concise to give you a quick lead into how to use these optimization features. Each part gives a brief introduction to certain compiler features. For more specific information, you can use the summary at the end of the document for quick reference or refer to listed references in each part for more detailed information.

This tutorial introduces optimizing C code at level 2 using the XL C compiler. You will learn about the compiler options you can use that will help you optimize your code at level 2. You will also learn how to make code changes that will address the common code problems uncovered at this level of optimization. Finally, and also the focus of this tutorial will be presenting an example of applying optimization level 2 to an application.

Learning objectives

• Explain what you need to do before optimizing code • Describe what optimization level 2 does and why you would want to use it • Optimize C code at level 2 • Address common C/C++ code problems using compiler options and rewriting

code

Time required

This tutorial should take approximately 30 minutes to finish. If you explore other concepts related to this tutorial, it could take longer to complete.

Skill level

Beginner

Audience

C application programmers

System requirements

• XL C for AIX®, V10.1 or XL C/C++ for AIX, V10.1 • AIX V5.3 TL 5300-06, AIX V6.1 or IBM® i V6.1 PASE

Note: The use of XL C for AIX, V10.1 or XL C/C++ for AIX, V10.1 is important because it specifies the default values for compiler optimization. These default values are different based on the version of the compiler.

Page 4: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

3

Part 1 - Introducing optimization The optimizer includes five base optimization levels: -O0, -O2, -O3, -O4, and -O5 to generate faster, more efficient applications. These levels allow you to choose from minimal optimization to intense program analysis that provides benefits even across programming languages. Optimization analyses range from local basic block to subprogram to file-level to whole-program analysis. The higher the optimization level, the more intense the program analysis becomes as increasingly sophisticated optimization techniques are applied to your code.

Getting the best optimization is a process of moving forward, then slowly backing off when you encounter problems. With any combination of code and compiler, either the code or compiler can contain problems, which prevents the code from reaching the highest optimization level offered by the compiler. This is why you should always begin at level 0 and make sure any code issues are fixed and the code is as efficient as possible, in order to get the most of high level optimizations.

This document will focus on optimization level 2. For more information about the different levels of optimization, what levels work best with code and the trade-offs to consider, see the paper Optimizing C code at optimization level 2 at the Rational C/C++ café (ibm.com/rational/cafe).

Page 5: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

4

Part 2 - Optimization level 2 Optimization level 2 is the first level of optimization provided by the XL family of compilers. You can invoke this level with -O2 or -O compiler options:

xlc -O2 source.c

After successfully compiling, executing, and debugging your application without optimization, recompiling at -O2 opens your application to a set of comprehensive low-level transformations that apply to subprogram or compilation unit scopes and can include some inlining. Optimizations at level 2 are a relative balance between increasing performance while limiting the impact on compilation time and system resources.

While increasing the level at which you optimize your application can provide an increase in performance, other compiler options can be just as important as -O2.

You can find more information about these options in the IBM XL C/C++ for AIX, V10.1 Optimization and Programming Guide, SC23-8833-00.

Example:

As an illustration, the following source code can be compiled without any optimization arguments (use xlc -o noopt foo.c) or with optimization level 2 (use xlc -O2 -o opt2 foo.c).

int main(){ long long size = 1000000; int a[size], b[size]; for(int i=0; i<size; ++i){ a[i] = i; b[i] = size - i; } int x = 0; for (int i=0 ; i<size ; ++i){ x = x + a[i] – b[size-i-l]; } return x; }

The unoptimized, default compile follows all the source code instructions literally. In this case, the compiler would generate load and store operations for each update to x in the second loop. With optimization level 2, the compiler would keep the intermediate values of x on a register, avoiding memory updates until the final value of x is computed. Furthermore, the addressing of arrays a and b will be analyzed to avoid expensive multiplications inside the body of the loop. For a more complete and detailed list of code changes performed at –O2, please see the paper Optimizing C code at optimization level 2

Page 6: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

5

available at the Rational C/C++ café (ibm.com/rational/café). You can observe the differences in the generated code between compiler options by adding -qlist to the compile command above, and checking the assembly listing, in this case called foo.lst.

Page 7: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

6

Part 3- Optimizing C/C++ code at level 2 using compiler options In this part you will learn about using various compiler options to optimize your C or C++ code at level 2 with minimal coding effort. You can instruct the compiler to use memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization, emit debug information, and turn off language standard checking. The following are options that can be used in conjunction with -O2.

Standards compliance

You can use the -qlanglvl compiler option to determine whether source code and compiler options should be checked for conformance to a specific language standard, or subset or superset of a standard.

The compiler follows the type-based aliasing rule in the C/C++standards when the -qalias=ansi compiler option is in effect (which is by default on some compiler invocations). Older code will often run into problems with type-based aliasing (also known as the ANSI aliasing rules).

If your code does not conform to C standard aliasing rules (for example, through use of unsafe pointer castings) you can use the -qalias=noansi compiler option. When noansi is in effect, the optimizer makes pessimistic aliasing assumptions. It assumes that all pointers can point to any object whose address is already taken, regardless of type. This will have a performance impact of many programs, though, so it is not recommended on the performance-sensitive parts of your application.

If your C application does not define functions with names identical to those of library functions, compile with -qlibansi. This option assumes that all functions with the name of an ANSI C library function are in fact the system functions. When -qlibansi is in effect, the optimizer can generate better code because it will know about the behavior of a given function, such as whether or not it has any side effects.

Program size

Some techniques (for example, loop unrolling and array vectorization) that the optimizer uses to improve performance might also make the program larger. For systems with limited storage, you can use -qcompact to reduce the expansion that takes place. If your program has many loop and array language constructs, using the -qcompact option will affect your application's overall performance. You

Page 8: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

7

might want to restrict using this option to those parts of your program where optimization gains will remain unaffected.

Memory limits

If the following message displays:

1500-030: (I) INFORMATION: functionname: Additional optimization may be attained by recompiling and specifying the MAXMEM option with a value greater than value.

you need to specify -qmaxmem=-1 which allows the compiler to use memory as needed without checking for limits.

Inlining

Inlining functions improves performance by avoiding performance overhead of a function call. You must specify a minimum optimization level of –O along with –Q(same as -qinline) to enable inlining of functions. Any function can be declared inline in the source code but the compiler is free to ignore the keyword.

Diagnostics

Before you compile code with optimization level 2, test and debug it without optimization. Fix all your coding problems before optimizing your code at level 2. To get additional informational messages about potential problems in your program, use the -qinfo=all compiler option, which enables all diagnostic messages for all groups.

You can use the -qflag compiler option to limit the diagnostic messages to those of a specified severity level or higher.

Debugging

You can instruct the compiler to emit debug information using the -g compiler option. However, debugging optimized programs presents special usability problems. For example, loops are unrolled and the values assigned by expressions are consolidated. There is no longer a correspondence between the line numbers for these statements in the optimized source as compared to the line numbers in the original source thus preventing symbolic debugging. You can instruct the compiler to emit debug information using the -g compiler option. This option will turn off inlining unless you explicitly request the compiler to inline functions using the -qinline compiler option. To produce abbreviated debugging information in a smaller object size, you can use the -qlinedebug compiler option.

Page 9: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

8

Compiler listings

You can instruct the compiler by using the -qsource compiler option to list the input source code with line numbers. If there is an error at a line, the associated error message appears after the source line. Lines containing macros have additional lines showing the macro expansion. By default, this section only lists the main source file.

The -qlist option lists the object code generated by the compiler. This section is

useful for diagnosing execution time problems, if you suspect the program is not performing as expected due to code generation error.

Architecture specific optimization

All PowerPC® machines share a common set of instructions, but might also include additional instructions unique to a given processor or processor family. The -qarch option can be used to target specific processor architecture for compilation. This results in code that might not run on older architectures, but will fully utilize the capabilities of the selected architecture. The -qtune option directs the optimizer to bias optimization decisions for executing the application on a particular architecture, but does not prevent the application from running on other architectures. For more information about architecture specific optimization, see the paper Optimizing C code at optimization level 2 at the Rational C/C++ café (ibm.com/rational/cafe).

Checkpoint:

• Optimization works best with source code that complies with the language standard

• Debugging is more problematic with an optimized executable • -qinfo, -qlist options can be used to aid code debugging

Self-test questions:

• Why might debugging be harder with an optimized executable? • What options must be specified to check standard compliance? • How can -qlibansi improve performance?

Page 10: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

9

Part 4 – Application of level 2 optimization

This part provides you with an example of how optimization level 2 can be used on a larger scale program. John Conway’s Game of Life (1) (2) is used as an illustration of how all of the previous parts come together to decrease runtime of the program’s executable.

Introduction

Game of Life is a cellular automation invented by John Conway in 1970. Cells are arranged in a rectangular shape, with each cell being a square that has a maximum of 8 and a minimum of 3 neighboring cells. There are three rules that govern this automation. First, if a live cell has 2 or 3 live neighbors it stays live. Second, if a live cell has less than 2 or more than 3 live neighbors, it dies in the next generation. Third, if a dead cell has exactly 3 live neighbors it becomes live in the next generation. The rules are applied to each cell simultaneously, so the transition from one state to the other is a one step process which results in a new generation.

First attempt at the code and some optimization

Suppose you don’t have a lot of time to implement this automation. Assume also that the initial state is given by a set of points that are stored in a file called input.dat (provided in Appendix A) in a form of one row per point, where x and y are separated by a whitespace. So, you jump straight into coding and after 30 minutes come up with something like this:

#include <stdio.h> #include <stdlib.h> #include <unistd.h> int rule[2][9] = {{0,0,0,1,0,0,0,0,0}, {0,0,1,1,0,0,0,0,0}}; typedef struct { int cell[200][800]; } state; void initialize(state * s, int numOfPoints, int height, int width) { // Initialises the state pointed to by s. This is where we put // our initial conditions. int i, j; for (i = 0; i < height; i++) { for (j = 0; j < width; j++) { s->cell[i][j] = 0; }

Page 11: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

10

} char buffer[numOfPoints*6]; int x[numOfPoints], y[numOfPoints]; i = 0; FILE* INPUT = fopen("input.dat", "r"); while(!feof(INPUT)) { fscanf(INPUT, "%d", &x[i]); fscanf(INPUT, "%d", &y[i]); i++; } fclose(INPUT); for (i=0; i<numOfPoints; i++) { s->cell[y[i]][x[i]] = 1; } } int nearestNeighbours(state *s, int i, int j, int height, int width) { // Returns the number of nearest neighbours in the state *s at // location [i][j]. We just sum up the neighbouring 8 cells int neighbours = (i > 0 && j > 0 && s->cell[i-1][j-1]) + (i > 0 && s->cell[i-1][j] ) + (i > 0 && j < width-1 && s->cell[i-1][j+1]) + (j > 0 && s->cell[i] [j-1]) + (j < width-1 && s->cell[i] [j+1]) + (i < height-1 && j > 0 && s->cell[i+1][j-1]) + (i < height-1 && s->cell[i+1][j] ) + (i < height-1 && j < width-1 && s->cell[i+1][j+1]); return neighbours; } void evolve(state * prev, state * next, int height, int width) { // Evolves state *prev by one generation, returning the result // in *next. int i, j; for (i = 0; i < height; i++) { for (j = 0; j < width; j++) { next->cell[i][j] = rule[prev->cell[i][j]][nearestNeighbours(prev, i, j, 200, 800)]; } } } void displayState(state * s) { int displayWidth = 80; int displayHeight = 20; int i, j; system("clear"); for (i = 0; i < displayHeight; i++) {

Page 12: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

11

for (j = 0; j < displayWidth; j++) { if (s->cell[i][j]) printf("@"); else printf(" "); } printf("\n"); } } int main() { state s0, s1; int i; initialize(&s0, 36, 200, 800); for (i = 0; i < 2500; i++) { displayState(&s0); evolve(&s0, &s1, 200, 800); displayState(&s1); evolve(&s1, &s0, 200, 800); } return 0; }

First, let’s compile this code without optimization (use xlc -o noopt gol.c command). Satisfied with a clean compile you can check the runtime by issuing time noopt >gol.out.

Now you can go ahead and start optimizing for better performance. Start with O2 optimization only, issuing xlc -O2 -o lesson2 gol.c. Note that compilation might take slightly (barely noticeable) more time. However, if you time the executable now (time lesson2 >gol.out), you will see that the performance gain is quite substantial. There is a lot of opportunity for loop unrolling, value numbering and instruction scheduling in the above program.

Applying other options

At this stage you can start applying options discussed earlier. Easy enough we can begin with -qlibansi, which applies to our code since we do not overload any ANSI functions. Also let’s allow the compiler to make a judgment on which functions to inline by specifying the -qinline option. Thus, our compile command becomes:

xlc -O2 -qlibansi -qinline -o lesson3 gol.c

Inspecting the runtime we notice an even bigger decrease.

Our next step is to take advantage of the hardware. Suppose we are compiling this code on a production POWER 4 machine, but it will actually be run on POWER 5 once released. As you might recall -qarch and -qtune are the option we need. Since we are not planning on running the application on any other architectures than 4 and

Page 13: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

12

5, -qarch=pwr4 -qtune=auto should suffice for testing. But for maximum performance on POWER 5, our compile command would be:

xlc -O2 –qlibansi –qinline -qarch=pwr5 -qtune=pwr5 -o lesson4 gol.c

Once again we check the runtime of the executable and see that it does provide performance improvement. A summary of average runtimes can be found in the table below.

Table of Runtimes: Code Attempt 1 Options

Average Runtime

(sec)* <N/A> 1-O2 0.845-O2 -qlibansi 0.789-O2 –qlibansi –qarch=pwr5 0.733-O2 –qlibansi –qarch=pwr5 –qtune=pwr5 0.715-O2 –qlibansi–qarch=pwr5 –qtune=pwr5 -qinline 0.647

*Note that your runtime might differ due to release and environment differences. Figures used here are runtimes normalized with time used at noopt.

Page 14: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

13

Final code

Several coding techniques can improve the performance of your code and allow for better optimization. For more information about each of these techniques, see the paper Optimizing C code at optimization level 2 at the Rational C/C++ café (ibm.com/rational/cafe). Now, let’s see how we can apply these coding techniques. As you know, low level I/O functions work best with optimization. So, we modify the initialize function as shown in the code example below. Note the use of open, read and atoi instead of the usual fopen and scanf. In function displayState we resort to a more specific putchar function, as opposed to a more general printf function.

static void initialize(state * s, int numOfPoints) { int i, j; for (i = 0; i < Height; i++) { for (j = 0; j < Width; j++) { s->cell[i][j] = 0; } } char buffer[numOfPoints*6]; int FILE = open("input.dat", O_RDONLY); read(FILE, buffer, numOfPoints*6); close(FILE); int x[numOfPoints], y[numOfPoints]; for(int i=0, j=0; i<=numOfPoints*6-6; i+=6, j++){ x[j] = atoi(&buffer[i]); y[j] = atoi(&buffer[i+3]); } for (i=0; i<=numOfPoints; i++) { s->cell[y[i]][x[i]] = 1; } }

Next, let’s look at function call optimization. The idea here is to give the optimizer as much information as possible, to eliminate worst-case assumptions. First, since all we have is one source file and our functions are not called by anything outside of this file’s scope, we prefix static keyword to all our function declarations. Also, to make interprocedural analysis work better, let’s provide function prototypes before all declarations. Because nearestNeighbors function does not have any side effects, we can specify it as #isolated_call.

static void initialise(state * s, int numOfPoints); static int nearestNeighbours(int i, int j, state *s); static void displayState(state * s); static void evolve(state * prev, state * next);

Page 15: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

14

#pragma isolated_call(nearestNeighbours)

Now we look at variable optimization. To start with, height and width variables stay constant throughout the code. So, we can let the compiler do the numeric substitution at compile time by defining compile time constants Height and Width using the #define directive so our state structure could be made smaller and more efficient with the use of short instead of int for the array type. Also instead of having the rule array in global scope, we will move it in the only place where it is used, that is, the evolve function. You might wish to declare rule as static, for additional optimization benefit. In nearestNeighbors function we remove the unnecessary variable, neighbours, and perform all calculations as part of the return statement.

As a result of the above transformations, we now arrive at a final version of the code. You must remember that the purpose of these code changes is to improve runtime of the executable when optimization options are applied. Therefore, it is possible that without O2, our new program will run slightly slower than its unmodified counterpart. However, as you can see in the Table of Runtimes Code Attempt 2, the new version does provide performance gains with O2 and related options.

#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> //Defines the size of the universe #define Width 800 #define Height 200 #define displayW 80 #define displayH 20 typedef struct { unsigned short int cell[Height][Width]; } state; // Function prototypes static void initialize(state * s, int numOfPoints); static int nearestNeighbours(int i, int j, state *s); static void displayState(state * s); static void evolve(state * prev, state * next); static void initialize(state * s, int numOfPoints) { int i, j; for (i = 0; i < Height; i++) { for (j = 0; j < Width; j++) { s->cell[i][j] = 0; } } char buffer[numOfPoints*6];

Page 16: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

15

int FILE = open("input.dat", O_RDONLY); read(FILE, buffer, numOfPoints*6); close(FILE); int x[numOfPoints], y[numOfPoints]; for(int i=0, j=0; i<=numOfPoints*6-6; i+=6, j++){ x[j] = atoi(&buffer[i]); y[j] = atoi(&buffer[i+3]); } for (i=0; i<=numOfPoints; i++) { s->cell[y[i]][x[i]] = 1; } } #pragma isolated_call(nearestNeighbours) inline static int nearestNeighbours(int i, int j, state *s) { return (i > 0 && j > 0 && s->cell[i-1][j-1]) + (i > 0 && s->cell[i-1][j] ) + (i > 0 && j < Width-1 && s->cell[i-1][j+1]) + (j > 0 && s->cell[i] [j-1]) + (j < Width-1 && s->cell[i] [j+1]) + (i < Height-1 && j > 0 && s->cell[i+1][j-1]) + (i < Height-1 && s->cell[i+1][j] ) + (i < Height-1 && j < Width-1 && s->cell[i+1][j+1]); } static void evolve(state * prev, state * next) { static int rule[2][9] = {{0,0,0,1,0,0,0,0,0}, {0,0,1,1,0,0,0,0,0}}; int i, j; for (i = 0; i < Height; i++) { for (j = 0; j < Width; j++) { next->cell[i][j] = rule[prev->cell[i][j]][nearestNeighbours(i, j, prev)]; } } } static void displayState(state * s) { system("clear"); int i, j; for (i = 0; i < displayH; i++) { for (j = 0; j < displayW; j++) { if (s->cell[i][j]) putchar('@'); else putchar(' '); } putchar('\n'); } }

Page 17: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

16

int main(int argc, char **argv) { state s0, s1; int i; initialize(&s0, 36); for (i = 0; i < 2500; i++) { displayState(&s0); evolve(&s0, &s1); displayState(&s1); evolve(&s1, &s0); } return 0; }

Table of Runtimes: Code Attempt 2 Options

Average Runtime*

<N/A> 1-O2 0.845-O2 -qlibansi 0.789-O2 –qlibansi –qarch=pwr5 0.733-O2 –qlibansi –qarch=pwr5 –qtune=pwr5 0.715-O2 –qlibansi–qarch=pwr5 –qtune=pwr5 -qinline 0.647

*Figures used hereare normalized with runtime at noopt of code attempt 1.

Page 18: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

17

To see the effect of optimization level 2 and how it affects the code, you can use the -qinfo compiler option to generate a .lst file that contains the assembly code generated by the compiler. The following table shows the length of the assembly code generated with the same source code (the improved code is used for this purpose) and different compiler options.

Table of Number of lines of Assembly Codes Options

Approx. number of lines of assembly

codes <N/A> 638 -O2 500 -O2 -qlibansi 503 -O2 –qlibansi –qarch=pwr5 479 -O2 –qlibansi –qarch=pwr5 –qtune=pwr5 479 -O2 –qlibansi–qarch=pwr5 –qtune=pwr5 -qinline 550

-O2 reduces the number of instructions dramatically, which directly affects performance. When -qinline is used, if you refer to the .lst file you can see that all the options are inlined within main. For details of the assembly instructions, you can refer to the .lst file generated with the -qlist option.

The -qcompact compiler option can be used with –O2 when a smaller sized executable file is favored. When this option is in use, the compiler avoids optimizations that increase code size. The following table illustrates the comparison of code size with and without -qcompact. You can run the executable files yourself to explore the effect on performance with -qcompact.

Table of Executable: Code Attempt 2 Options w/o –qcompact

(bytes) -qcompact

(bytes) <N/A> 9379 9379 -O2 8640 8430 -O2 –qlibansi 8594 8393 -O2 –qlibansi –qarch=pwr5 8370 8329 -O2 –qlibansi –qarch=pwr5 –qtune=pwr5 8370 8329 -O2 –qlibansi–qarch=pwr5 –qtune=pwr5 -qinline 8394 8329

Higher optimization levels

In follow on tutorials, we will look at applying advanced optimization levels to our improved code. As you increase optimization level in compiling your program, performance of the executable will vary more and more from one program to another. More aggressive code transformations might be performed at these optimization levels and bigger precision and storage tradeoffs might take place. Nonetheless, if your goal is to develop high performance software, higher optimization levels are

Page 19: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

18

definitely worth looking into.

Checkpoint

Summary:

• Your code does not need to be perfect for optimization purposes • Depending on your source code, target architecture and memory availability

optimization options provide marginal to major performance gains • While no-optimization runtime might increase, simple code changes often provide

considerable advantage when compiled with optimization options

Self-test questions:

• If your program does not overload ANSI functions, what optimization option could you benefit from?

• What are some performance related considerations for function declarations? • Now that you know more about optimization level 2, how will you use it in your

current project(s)?

Page 20: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

19

Summary

Over the course of this tutorial you learned what the benefits of using optimization level 2 are and how to apply various optimization options. You also learned what code changes can help optimization algorithms to improve your program’s performance.

Takeaway points

• Optimization options work best on errorless code that conforms to language standards

• Due to code transformation during optimization process, debugging opportunities might be inhibited

• Basic optimization does not require any code changes. Any conforming code will be optimized to a certain extent

• Although the -O2 option provides a good boost in performance, additional optimization options can be just as important in reducing the runtime

• To get maximum results from optimization options your code needs to provide as much useful information (for example, #pragma expected_value or inline directives) to the compiler as possible

Additional resources

If you would like to learn more about all the different optimization options consult the IBM XL C/C++ for AIX, V10.1 Compiler Reference, SC23-8886-00. Start with options that we have talked about in this tutorial and continue learning about optimization levels 3, 4 and 5.

Guidelines on writing code that is best suited for optimization can be found in IBM XL C/C++ for AIX, V10.1 Optimization and Programming Guide, SC23-8833-00. Here you will find more information about efficient I/O methods, use of built-in functions, as well as additional notes on how to improve performance with compiler options.

Be it optimization options or code changes, there is abundant input from knowledgeable professionals in the Rational C/C++ Café. Simply type “optimization” in the search bar and you will be pointed to a number of useful documents and threads with further discussions on the subject.

References (1) Mathematical Games by Martin Gardner. The fantastic combinations of John Conway's new solitaire game "Life" in October 1970 Scientific American pages 120-123. (2) Mathematical Games by Martin Gardner. On cellula automata, self-reproduction, the Garden of Eden and the game "Life" in February 1971 Scientific American cover and pages 112-117.

Page 21: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

20

Contacting IBM IBM welcomes your comments. You can send them to [email protected]

Page 22: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

21

Appendix A 20 15 21 15 20 16 21 16 30 15 30 16 30 17 31 14 31 18 32 13 33 13 32 19 33 19 34 16 35 14 35 18 36 15 36 16 36 17 37 16 40 15 40 14 40 13 41 15 41 14 41 13 42 12 42 16 44 16 44 17 44 12 44 11 54 13 55 13 54 14 55 14

Page 23: Optimization Level 2 Tutorial - RevisionJan26...memory without checking for limits, target your application to a particular machine, optimize loops and perform array vectorization,

22

January 2010 References in this document to IBM products, programs, or services do not imply that IBM intends to make these available in all countries in which IBM operates. Any reference to an IBM program product in this publication is not intended to state or imply that only IBM’s program product may be used. Any functionally equivalent program may be used instead. IBM, the IBM logo, and ibm.com® are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at ″Copyright and trademark information″ at www.ibm.com/legal/copytrade.shtml © Copyright International Business Machines Corporation 2010. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.