Presented by: Sameer Kulkarni Dept of Computer & Information Sciences University of Delaware
Presented by: John Tully Dept of Computer & Information Sciences University of Delaware
description
Transcript of Presented by: John Tully Dept of Computer & Information Sciences University of Delaware
CISC 879 - Machine Learning for Solving Systems Problems
Presented by: John TullyDept of Computer & Information Sciences
University of Delaware
Using Machine Learning to Guide Architecture Simulation
Greg Hamerly (Baylor University)Gerez Perelman, Jeremy Lau, Brad Calder (UCSD)
Timothy Sherwood (UCSB)
Journal of Machine Learning Research 7 (2006)http://cseweb.ucsd.edu/~calder/papers/JMLR-06-SimPoint.pdf
CISC 879 - Machine Learning for Solving Systems Problems
Simulation is Critical!
• Allows engineers to understand cycle-level behavior of processor before fabrication
• Can play with design options cheaply. How are performance, complexity, area, power affected when I make modification X, and remove feature Y?
CISC 879 - Machine Learning for Solving Systems Problems
But... Simulation is SLOW
• Modelling at cycle level is very slow
• Simplescalar in cycle-accurate mode: a few hundred million cycles per hour
• Modelling at gate level is very, very, very slow
• ETI cutting-edge emulation technology: 5,000 cycles/second (24 hours = ~1 second of Cyclops-64 instructions).
CISC 879 - Machine Learning for Solving Systems Problems
Demands are increasing
• Size of benchmarks: applications can be quite large.
• Number of programs: Industry standard benchmarks are large suites. Many focus on variety (i.e. SPEC – 26 programs. Stress ALUs, FPUs, Memory, Cache, etc.)
• Iterations required: just to experiment with one feature (cache size) can take hundreds of thousands of benchmark runs
CISC 879 - Machine Learning for Solving Systems Problems
‘Current’ Remedies
• Simulate programs for N instructions (whatever your timeframe allows), and just stop.
• Similarly, fast-forward through initialization portion, and then simulate N instructions.
• Simulate N instructions from only the “important” (most computationally intensive) portions of a program.
• Neither work well, and at their worst are embarrassing: error rates of almost 4,000%!
CISC 879 - Machine Learning for Solving Systems Problems
SimPoint to the Rescue
• 1. As a program executes, its behavior changes. The changes aren’t random – they’re structured as sequences of recurring behavior (termed phases).
• 2. If repetitive and structured behavior can be identified, then we only need to sample each unique behavior of a program (and not the whole thing) to get an idea for its execution profile.
• 3. How can we identify repetitive, structured behavior? Use machine learning!
• Now, only a small set of samples needed. Collect points from each phase (simulation points), and weigh them – this accurately depicts execution of the entire program.
CISC 879 - Machine Learning for Solving Systems Problems
Defining Phase Behavior
• Seems pretty easy at first... let's just collect hardware-based statistics, and classify phases accordingly
• CPI (performance)
• Cache Miss Rates
• Branch Statistics (Frequency, Prediction Rate)
• FPU instructions per cycle
• But what's the problem here?
CISC 879 - Machine Learning for Solving Systems Problems
Defining Phase Behavior• Problem: if we use hardware-based stats, we're tying
phases to architectural configuration!
• Every time we tweak architecture, we must re-define phases!
• Underlying methodology: identify phases without relying on architectural metrics. Then, we can find a set of samples that can be used across our entire design space.
• But what can we use that's independent of hardware-based status, but still relates to fundamental changes in what the hardware is doing?
CISC 879 - Machine Learning for Solving Systems Problems
Defining Phase Behavior
• Basic Block Vector (BBV): a structure designed to capture how a program changes behavior over time.
• A distribution of how many times each basic block is executed over an interval (can use a 1D-array)
• Each entry weighted by # of instructions in the BB (so all instructions have equal weight).
• Subsets of information in BBVs can also be extracted
• Register usage vectors
• Loop / branch execution frequencies
CISC 879 - Machine Learning for Solving Systems Problems
Defining Phase Behavior
• Now, we can use BBVs to find patterns in the program. But can we prove they're useful?
• Detailed study by Lau et. al: very strong correlation between the following:
• 1) Difference in BBV of the interval, and BBV of the whole program (code changes)
• 2) CPI of the interval (performance)
• Graphic on next slide......
• Things are looking really good now – we can create a set of phases (and therefore, points to simulate) by ONLY looking at executed code.
CISC 879 - Machine Learning for Solving Systems Problems
Defining Phase Behavior
CISC 879 - Machine Learning for Solving Systems Problems
Extracting Phases
• Next step: how do I actually turn my BBV vectors into phases?
• Create a function to compare two BBVs: how similar are they?
• Use machine learning data clustering algorithms to group similar BBVs. Each cluster (set of similar points) = a phase!
• SimPoint is the implementation of this
• Profiles programs (divides them into intervals, and creates BBVs for each).
• Use k-means clustering algorithm. Input includes granulatiry of clusters - that dictates the size and abundance of phases!
CISC 879 - Machine Learning for Solving Systems Problems
Choosing Simulation Pts
• Final Step: choose simulation points. From each phase, SimPoint chooses one representative interval that will be simulated (in full detail) to represent the whole phase.
• All points in the phase are (theoretically) similar in performance statistics – so we can extrapolate.
• Machine learning also used to pick representative points of a cluster (the interval to use from a phase).
• Points are weighed based on interval size (and phase size, of course)
• Only needs to be done one per program+input combination – remember why?
CISC 879 - Machine Learning for Solving Systems Problems
Choosing Simulation Pts
• User can tweak interval length, # clusters, etc – tradeoff between number of points simulated, and simulation time.
CISC 879 - Machine Learning for Solving Systems Problems
Experimental Framework
• Test Programs: SPEC Benchmarks (26 applications, about half integer, half FP; designed to stress all aspects of a processor.)
• Simulation: SimpleScalar, Alpha architecture.
• Metrics: accuracy of simulation measured in CPI prediction error
CISC 879 - Machine Learning for Solving Systems Problems
Million Dollar Question...
• How does phase classification do?
• SPEC2000, 100 million instruction intervals, no more than 10 simulation points
• Gzip, Gcc: only 4 and 8 phases found, respectively
CISC 879 - Machine Learning for Solving Systems Problems
Million Dollar Question...
CISC 879 - Machine Learning for Solving Systems Problems
• How accurate is this thing?
• A lot better than “current” methods.....
Million Dollar Question...
CISC 879 - Machine Learning for Solving Systems Problems
Million Dollar Question...
CISC 879 - Machine Learning for Solving Systems Problems
• How much time are we saving?
• In previous result, we're only simulating 400-800 million instructions for SimPoint results. According to SPEC benchmark data sheet, 'reference' input configurations are 50 billion and 80 billion instructions, respectively.
• So, baseline simulation needed to execute ~100 times more instructions for this configuration – took several months!
• Imagine if we needed to run on a few thousand combinations of cache size, memory latency, etc....
• Intel / Microsoft use it - must be pretty good.
Million Dollar Question...
CISC 879 - Machine Learning for Solving Systems Problems
Putting it all together
• First implementation of machine learning techniques to perform program phase analysis.
• Main thing to take away: applications (even complex ones) only exhibit a few unique behaviors – they're simply interleaved with each other over time.
• Using machine learning, we can find these behaviors with methods that are independent of architectural metrics.
• By doing so, we only need to simulate a few carefully chosen intervals, which greatly reduces simulation time.
CISC 879 - Machine Learning for Solving Systems Problems
Related / Future Work
• Other clustering algorithms with same data (multinomail clustering, regression trees) – k-means appears to do the best.
• “Un-tie” simulation points from binary – how could we do this?
• Map behavior back to source level after detecting it
• Now, we can use same simulation points for different compilations / input of a program
• Accuracy is just about as good as with fixed intervals (Lau et. al)