Static Branch Frequency and Program Profile Analysis

25
Static Branch Frequency and Program Profile Analysis Divino César Soares Lucas [email protected] Laboratório de Sistemas de Computação Instituto de Computação UNICAMP Youfeng Wu [email protected] Intel Labs James R. Larus [email protected] University of Wisconsin

Transcript of Static Branch Frequency and Program Profile Analysis

Page 1: Static Branch Frequency and Program Profile Analysis

Static Branch Frequency and

Program Profile Analysis

Divino César Soares Lucas

[email protected]

Laboratório de Sistemas de Computação

Instituto de Computação

UNICAMP

Youfeng Wu

[email protected]

Intel Labs

James R. Larus

[email protected]

University of Wisconsin

Page 2: Static Branch Frequency and Program Profile Analysis

Schedule

1. Introduction

2. Related Work

3. Key Idea

4. Branch Prediction

5. Branch Probabilities

6. Combining Predictions

7. Local Block and Edge Frequency

8. From Local to Global Frequencies

9. Results

10. Conclusion

11. References

Page 3: Static Branch Frequency and Program Profile Analysis

Introduction

• What is a program profile?

• Dynamic profile

• Static profile

• Why we need profile?

• Instruction scheduling

• Identifying program bottlenecks

• Enhance memory locality

Page 4: Static Branch Frequency and Program Profile Analysis

Related Work

• Dynamic profile

• Work centered on reducing profiling overhead [3, 6]

• Static profile

• Simple estimation heuristics [4]

• Estimation based on markov models [5]

Page 5: Static Branch Frequency and Program Profile Analysis

Key Idea [1]

• Predict Branches

• Use heuristics

• Compute Probabilities

• Use heuristic hit rates

• Compute Frequency

• Use probabilities

Page 6: Static Branch Frequency and Program Profile Analysis

Branch Prediction

• A branch prediction predicts if a branch will be taken or not

taken. It’s a binary decision!

• Some static heuristics [2]:

• LBH - Loop Branch Heuristic

• PH - Pointer Heuristic

• OH - Opcode Heuristic

• GH - Guard Heuristic

• LEH - Loop Exit Heuristic

• LHH - loop Header Heuristic

• CH - Call Heuristic

• SH - Store Heuristic

• RH - Return Heuristic

Page 7: Static Branch Frequency and Program Profile Analysis

Branch Probabilities

• A branch probability is a estimate whether the branch will

be taken or not. It’s a continuous value among [0, 1].

Heuristic H.R.

Loop Branch Header 88%

Pointer Heuristic 60%

Opcode Heuristic 84%

Guard Heuristic 62%

Loop Exit Heuristic 80%

Loop Header Heuristic 75%

Call Heuristic 78%

Store Heuristic 55%

Return Heuristic 72%

• We will use these Hit Rates as

branch probabilities.

Page 8: Static Branch Frequency and Program Profile Analysis

Combining Predictions

• What happen if two or more heuristics are applicable?

if (k < 0) then

k = y;

else

return ;

end-if

• OH predicts the then part! (With 84% of hit rate).

• RH predicts the else part! (With 72% of hit rate).

• In these situations we use Dempster-

Shafer algorithm…

Page 9: Static Branch Frequency and Program Profile Analysis

Combining Predictions

• Each branch has a set of possible targets. In our case two,

taken or not taken:

𝐵 = *𝑡1, 𝑡2+

• Each heuristic gives a evidence that an event can happen:

𝑕1 𝑡1 = 𝑎 𝑕1 𝑡2 = 1 − 𝑎

𝑕2 𝑡1 = 𝑏 𝑕2 𝑡2 = 1 − 𝑏

• Dempster-Shafer algorithm combine these evidences:

𝑕1⊕𝑕2 𝑡1 = 𝑕1(𝑡1)𝑕2(𝑡1)

𝑕1 𝑡1 𝑕2 𝑡1 + 𝑕1(𝑡2)𝑕2(𝑡2)

𝑕1⊕𝑕2 𝑡2 = 𝑕1(𝑡2)𝑕2(𝑡2)

𝑕1 𝑡1 𝑕2 𝑡1 + 𝑕1(𝑡2)𝑕2(𝑡2)

Page 10: Static Branch Frequency and Program Profile Analysis

Combining Predictions

Example:

𝑕1 𝑡1 = 0.5 𝑕1 𝑡2 = 0.5

𝑕2 𝑡1 = 0.7 𝑕2 𝑡2 = 0.3

𝑕1⊕𝑕2 𝑡1 = 0.5𝑥0.7

0.5𝑥0.7+0.5𝑥0.3 = 0.7

𝑕3 𝑡1 = 0.6 𝑕3 𝑡2 = 0.4

𝑕1⊕𝑕2 𝑡2 = 0.5𝑥0.3

0.5𝑥0.7+0.5𝑥0.3 = 0.3

𝑕2⊕𝑕3 𝑡1 = 0.7𝑥0.6

0.7𝑥0.6+0.3𝑥0.4 = 0.778

𝑕2⊕𝑕3 𝑡2 = 0.3𝑥0.4

0.7𝑥0.6+0.3𝑥0.4 = 0.222

Page 11: Static Branch Frequency and Program Profile Analysis

Local Block and Edge Frequency

• The Branch/Edge frequency is a estimate of how often a

block or edge is executed or taken.

• We calculate local branch/block frequency by propagating

branch probabilities, that is:

bfreq(bi) = 1 bi is entry

bfreq(bi) = 𝑓𝑟𝑒𝑞(𝑏𝑝 → 𝑏𝑖) 𝑏𝑝 ∊ 𝑝𝑟𝑒𝑑 𝑏𝑖 otherwise

freq(bi → bj) = bfreq(bi) prob(bi → bj)

• But these formulas doesn’t work when we have a cycle!

Page 12: Static Branch Frequency and Program Profile Analysis

Local Block and Edge Frequency

𝑏𝑓𝑟𝑒𝑞 𝑏0 = 𝑖𝑛_𝑓𝑟𝑒𝑞(𝑏0) + 𝑓𝑟𝑒𝑞(𝑏𝑖𝑘𝑖=1 → 𝑏0)

= 𝑖𝑛_𝑓𝑟𝑒𝑞(𝑏0) + (𝑏𝑓𝑟𝑒𝑞(𝑏𝑖𝑘𝑖=1 )𝑝𝑟𝑜𝑏(𝑏𝑖 → 𝑏0))

= 𝑖𝑛_𝑓𝑟𝑒𝑞(𝑏0) + (𝑏𝑓𝑟𝑒𝑞(𝑏0𝑘𝑖=1 )𝑟𝑖𝑝𝑟𝑜𝑏(𝑏𝑖 → 𝑏0))

= 𝑖𝑛_𝑓𝑟𝑒𝑞(𝑏0) + 𝑏𝑓𝑟𝑒𝑞(𝑏0) 𝑟𝑖𝑝𝑟𝑜𝑏(𝑏𝑖 → 𝑏0)𝑘𝑖=1

Let

𝑐𝑝 𝑏0 = 𝑟𝑖𝑝𝑟𝑜𝑏(𝑏𝑖 → 𝑏0)𝑘𝑖=1

𝑏𝑓𝑟𝑒𝑞 𝑏0 = 𝑖𝑛_𝑓𝑟𝑒𝑞(𝑏0) + 𝑏𝑓𝑟𝑒𝑞 𝑏0 𝑐𝑝(𝑏0)

𝑏𝑓𝑟𝑒𝑞 𝑏0 = 𝑖𝑛_𝑓𝑟𝑒𝑞(𝑏0)

1 − 𝑐𝑝(𝑏0)

Page 13: Static Branch Frequency and Program Profile Analysis

Local Block and Edge Frequency

Example:

𝑏𝑓𝑟𝑒𝑞 𝑏0 = 1

1−0.88−0.88𝑥0.12 −0.88𝑥0.12𝑥0.12 = 578.70

Page 14: Static Branch Frequency and Program Profile Analysis

From Local to Global Frequencies

• The frequency a function f calls another function g can be

expressed by – considering one invocation of f:

𝑙𝑓𝑟𝑒𝑞 𝑓, 𝑔 = bfreq(bi) calls(bi, g)

• The global frequency of f calling g is:

𝑔𝑓𝑟𝑒𝑞 𝑓, 𝑔 = cfreq(f) lfreq(f, g)

• Where:

𝑐𝑓𝑟𝑒𝑞 𝑓 = 1, 𝑓 𝑖𝑠 𝑚𝑎𝑖𝑛 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛

𝑐𝑓𝑟𝑒𝑞 𝑓 = 𝑓𝑟𝑒𝑞(𝑝, 𝑓) 𝑝 ∊ 𝑝𝑟𝑒𝑑 𝑓 , 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒

• Global block/edge frequency can be calculated multiplying

function execution frequency by local block/edge frequency.

Page 15: Static Branch Frequency and Program Profile Analysis

Results

• Scores of SPEC92 local block frequency:

Page 16: Static Branch Frequency and Program Profile Analysis

Results

• Scores of SPEC92 local edge frequency:

Page 17: Static Branch Frequency and Program Profile Analysis

Results

• Scores of SPEC92 local edge frequency:

Page 18: Static Branch Frequency and Program Profile Analysis

Results

• Results came from SPECint92 C benchmarks and some

Unix applications.

• The system used was a Sequent S2000/750 with i486

processors and the Sequent DYNIX/ptx C compiler 2.1.

• Use of Wall [5] weighted and unweighted match score.

Page 19: Static Branch Frequency and Program Profile Analysis

Results

• Scores of SPEC92 global function call frequency:

Page 20: Static Branch Frequency and Program Profile Analysis

Results

• Scores of SPEC92 global block frequency:

Page 21: Static Branch Frequency and Program Profile Analysis

Results

• Scores of SPEC92 global edge frequency:

Page 22: Static Branch Frequency and Program Profile Analysis

Results

• Scores for Unix commands:

Page 23: Static Branch Frequency and Program Profile Analysis

Conclusion

• A new technique for static profile was presented.

• The technique introduced a new way to combine multiple

evidences for a branch outcome.

• Although the heuristics hit rate are from another

environment they resulted in considerable results.

Page 24: Static Branch Frequency and Program Profile Analysis

References

[1] Y. Wu and J. R. Larus. Static Branch Frequency and Program Profile Analysis.

In Proceedings of the 27th Annual International Symposium on Microarchitecture.

pages 1-11, 1994.

[2] T. Ball and J. R. Larus. Branch prediction for free. In SIGPLAN Conference on

Programming Language Design and Implementation. pages 300-313, 1993.

[3] T. Ball and J. R. Larus. Optimally profilling and tracing programs. ACM

Transactions on Programming Languages and Systems. 16(4):1319-1360, July

1994.

[4] T. A. Wagner, V. Maverick, S. L. Graham, and M. A. Harrison. Accurate static

estimators for program optimization. In Proceedings of the ACM SIGPLAN’94

conference on Programming Language Design and Implementation. pages 85-96.

ACM Press, 1994.

Page 25: Static Branch Frequency and Program Profile Analysis

References

[5] D. W. Wall. Predicting Program Behavior Using Real or Estimated Profiles.

Proceedings of ACM SIGPLAN’91 Conference on Programming Language Design

and Implementation. pages 59-70, 1991.

[6] V. Sarkar. Determining average program execution times and their variance. In

SIGPLAN Conference on Programming Language Design and Implementation.

pages 298.312, 1989.