Languages and Compiler Design II Re-Introduction from CS 321
High-Productivity Languages for HPC: Compiler Challenges
Transcript of High-Productivity Languages for HPC: Compiler Challenges
David Chase
2005-10-25
High-Productivity Languages for HPC: Compiler Challenges
Page
Compiler Challenges
Fortress
• New language• Designed for productivity, high performance,
abundant parallelism.
2
ContributorsGuy SteeleJan-Willem MaessenEric AllenDavid ChaseSukyoung RyuVictor LuchangcoChristine Flood
Sam Tobin-HochstadtYossi LevCheryl McCoshJoe HallettCarl EastlundJoao Dias
Page
Compiler Challenges
High productivity
• Speed of coding to scale (speed, fault tolerance)• Ease of reuse• Ease of debugging• Ease of maintenance• Portable performance (1P, CMT, SMP, NUMA, MPP)• Ease of deployment (fragile dependence on DLLs?)• Ease of system maintenance• Larger pool of programmers• Domain-specific extensions
3
Page
Compiler Challenges
• Garbage collection• Transactional memory• Fault tolerance• Trustworthy compilers• (Support for) cache-oblivious/work-stealing style• Programming-by-contract• Better “human factors”
4
High productivity features(that present compiler challenges)
Page
Compiler Challenges
GC, TM, FT, and the compiler• GC and the compiler is pretty-well understood> Compiler can help with safepoints, barrier optimization,
logging optimization, and pointer maps.> Runtime and compiler can be co-designed.> Compiler must be aware of runtime, concurrency, and
memory-model issues.• GC, TM, and FT are similar in many ways> Make copies of data> Monitor reads and writes> Profit from locality information> Tend to use read/write barriers, logging, and safepoints> Can we combine these? How can the optimizer help?
5
Page
Compiler Challenges
Example: Card-mark design/optimization
• Generational GC uses write-barriers to enforce old-young partition. Pointers from old to young must be treated specially.
• Traditional software write-barrier maps “card” X to heap addresses [X*256, X*256+256).
• A pointer written to address Y requires mark of card Y/256.
• Garbage collection looks for dirty cards, finds corresponding objects, and records actual old-to-young pointers.
6
Page
Compiler Challenges
Example: Card-marks and safe points
• Marks for multiple writes to the same address are redundant, provided no GC can intervene.
• (Non-concurrent) GC can only occur at safepoints; therefore, if two writes to Y are not separated by a safepoint, one card mark can be eliminated.
7
o.f = pmark(&o.f)o.f = qmark(&o.f)
o.f = po.f = qmark(&o.f)
Page
Compiler Challenges
8
Example: Card-marks, per-object
• Scanning often requires access to object header; might as well scan the whole object. Objects (unlike arrays) are usually small-ish.
• Marks for writes to different fields can be redundant, provided no safepoint intervenes.
5
o.f1 = pmark(&o.f1)o.f2 = qmark(&o.f2)
o.f1 = po.f2 = qmark(&o)
Page
Compiler Challenges
Example: Card-mark loop optimization
• When scanning cards, extend the range of addresses if the ending object is an array of pointers. card X maps to [X*256-64, X*256+256).
• In compiler, make writeBarrier( A[i] ) be redundant with writeBarrier( A[i+K] ) for 0 <= K < 8.
• When a loop writing into an array of objects is unrolled by 8, all but one of the write-barriers is removed (provided no safepoint intervenes)
9
Page
Compiler Challenges
Example: card-mark youth optimization.
• Card marking is used to record creation of pointers from “old” objects to “young” object.
• Newly-allocated objects are guaranteed young until a GC occurs.
• Until a safepoint intervenes, stores into a fresh object require no card marking.
10
o = new O()o.f1 = pmark(&o.f1)o.f2 = qmark(&o.f2)
o = new O()o.f1 = po.f2 = q
Page
Compiler Challenges
Ths OS is in the way. Do we trust our compilers enough to replace it?
• OS threads are slow and clunky.• OS traps are slow and clunky.• OS virtual memory is inflexible.• Do we trust our compilers enough to let them take
the place of the kernel/user boundary?> We lack consensus on “correctness” for parallel programs.
+ Many correct answers; optimization may shrink that set, but that’s OK.> Only an option for safe languages.
What about C and C++?
11
Page
Compiler Challenges
JavaGrande Sync Method (larger is better)
0
1000000
2000000
3000000
4000000
5000000
6000000
0 32 64 96 128 160 192 224 256
Number of threads
To
tal s
yncs
per
sec
on
d
NBSun 1.3.1IBM Windows
12
OS threads
User threads
The OS is in the way; synchronization
Page
Compiler Challenges
13
JavaGrande Barrier Simple (larger is better)
0
50000
100000
150000
200000
250000
300000
350000
400000
0 32 64 96 128 160 192 224 256
Number of Threads
To
tal b
arri
ers
per
sec
on
d
NBSun 1.3.1IBM Windows
OS threads
User threads
The OS is in the way; wait/notifyAll
Page
Compiler Challenges
Cache-oblivious computing• Subdivide a problem on its largest dimension.> Good in theory, often good in practice> Automatically exploits size of caches, TLBs, working set > Good for work-stealing, work-dealing> Minimizes area of boundary between subproblems> Also N-processors oblivious
• But...> code generated by inlining at leaves seems to be “ugly”> work-stealing is spatial-locality-ignorant> can we pipeline between recursive nests?> can we map it to a cluster?
14
Page
Compiler Challenges
Programming-by-contract
• Said to help productivity, seems like it should.• How should the optimizer use contract information?> Can we optimize contracts?> Can contracts help with library-level optimizations?> Should we only generate code for contracts that the
compiler finds useful?• Does the contract language allow us to say the right
things?
15
Page
Compiler Challenges
Example contract for Vector
16
class Vector { Object[] items; int size; invariant {size <= items.length}
int size() { return size; } void put(int i, Object o) requires { 0 <= i < size() } { items[i] = o; }
Object get(int i) requires { 0 <= i < size() } { return items[i]; }}
Page
Compiler Challenges
Human factors
• Error-reporting must be as informative as possible> Parsing> Type-inference> Exception stack traces (like Java, or Python)
• Error-reporting should be as early as possible• Observability> Why is it slow?> Which threads have problems?
• How hard is it to say what I mean?> Expressive type systems can be a pain.
17
Page
Compiler Challenges
18
I am betting that this is the name of some C++ function, run through a one-way hash to yield a unique name.
__ZNSt15basic_streambufIcSt11char_traitsIcEED4Ev
(Sarcastically) I assume a one-way hash, because it would just be so incredibly stupid not to fix the Unix™ linker for A DOZEN WHOLE YEARS to make it demangle C++ identifiers into something meaning something to the programmer....
How disgusted do I need to be, that my mail agent makes a swoosh sound IN STEREO, and my windows pour into their minaturized form, but my linker still hands me missing symbols that look like line noise.
Our rotten tools; excerpt from a rant
David [email protected]
High-Productivity Languages for HPC: Compiler Challenges