Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming...

16
Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore Workshop June 21

Transcript of Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming...

Page 1: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.

Moderator:

John Mellor-Crummey

Department of Computer ScienceRice University

Programming Languages/Models and Compiler Technologies

Microsoft Manycore Workshop June 21, 2007

Page 2: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.

2

Panelists

• David August - Princeton University

• Saman Amarasinghe - Massachusetts Institute of Technology

• Guy Blelloch - Carnegie Mellon University

• Charles Leiserson - Massachusetts Institute of Technology

• Uzi Vishkin - University of Maryland, College Park

Page 3: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.

3

Architectural Challenges

• Significant parallelism

• Multiple kinds of parallelism—cores

—ILP

—SIMD

• Diversity of cores

• Run-time throttling of cores for power mgmt

• Memory hierarchy—bandwidth

– near term: will continue to be a significant bottleneck

– long term: 3D stacked memory?

—long and often non-uniform memory latencies

—scratch pads

Page 4: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.

4

Roles of Parallel Programming Models

• Enhance programmer productivity through abstraction

• Manage platform resources to deliver performance

• Provide standard interface for platform portability

Page 5: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.

5

The Goal

Simpler ways of conceptualizing, expressing, debugging, and tuning scalable parallel programs

• Multiple models will be necessary

• Models will necessarily trade off simplicity, expressivity, relevance to legacy code, and performance

Page 6: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.

6

To Succeed, Parallel Programming Models Must …

• Be ubiquitous—cross platform

—at a minimum: laptops, SMP servers

—distributed memory clusters?

• Be expressive

• Be productive—easy to write

—easy to read and maintain

—easy to reuse

• Have a promise of future availability and longevity

• Be efficient

• Be supported by tools

Page 7: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.

7

Simplifying Parallel Programming

A high-level parallel language should …

• Provide global address space —beware exposed buffering …

• Separate concerns: partitioning, mapping, and synchronization vs. algorithm specification—“viscosity” comes from premature mingling of these issues

• Enable programmer to manage locality at a high level—locality = performance

—affinity between data and computation

– e.g. HPF’s “ON HOME” declarations

Page 8: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.

8

Design Issues I

• Ultimate control vs. simplicity of use—“library developers” vs. “productivity users”

– should it be the same language for both?

extensible language model (Sun’s Fortress)

kitchen sink model (X10)

• Implicit vs. explicit parallelism—implicit parallelism is often more malleable —better supports dynamic adaptation

• Compiler assisted vs. compiler-centric—Co-array Fortran and UPC

– user control over work decomposition, data movement, and synchronization

—HPF: compiler must deliver or all is lost

• Lazy vs. eager parallelism—Cilk’s lazy parallelism provides a model for “scalable” binaries—eager parallelism adds unnecessary overhead

Page 9: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.

9

Design Issues II

• Deterministic vs. non-deterministic models—deterministic “clocked final model”

– Saraswat et al. (www.saraswat.org/cf.pdf)

• Static vs. dynamic scheduling—dynamic scheduling will be increasingly important

– irregular computations, task parallelism

– adaptive scheduling in response to “core throttling”

• Cooperative vs. independent scheduling of work—does benefit of shared cache outweigh difficulty of using it?

– tightly synchronous vs. more loosely synchronous

• Scalable to distributed-memory ensembles?—broad community probably only cares about tightly-coupled platforms—some government and industry clients will always have extreme needs

• Importance of managing affinity between cores and data—important for highest efficiency for library developers

Page 10: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.

10

Transactions are not “THE” Answer

• Transactions are a piece of the puzzle: atomicity

• Other aspects of the parallel programming problem—identifying concurrency

—partitioning work

—ordering actions

Page 11: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.

11

Autotuning

• Seductive idea

• Very successful as a library-based approach—FFTW, Atlas, OSKI, …

• Much work needed to apply to applications rather than kernels—huge search space

– progress in effective truncated search

—model guidance can be effective

—autotuning for parallelism

– dangerously close to automatic parallelization

Page 12: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.

12

Rice Experience: Lessons from HPF

• Good data and computation partitionings are essential—without good partitionings, parallelism suffers

—flexible user-control is essential

• Excess communication undermines scalability—both frequency and volume must be right

—embrace user hints to guide communication placement and optimization

– e.g. HPF/JA directives: REFLECT, LOCAL, PIPELINE, etc.

• Single processor efficiency is critical—must use caches effectively on microprocessors

—Icache: beware of complex machine-generated code

—Dcache: beware of communication footprint

• Optimizing tightly-coupled algorithms can be hard—if the compiler doesn’t optimize it, performance may be doomed!

Page 13: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.

13

Rice Experience: HPF vs. Co-array Fortran

• Rice dHPF - a decade of investment in compiler technology—not quite, govt cut funding here too, just like architecture

—polyhedral code generation models (like Lethin described)

• Co-array Fortran for clusters—a few years effort by a pair of students

• Result: Co-array Fortran bests HPF—more expressive

—higher performance

—shorter time to solution

—currently, can be HARDER to program than MPI

Page 14: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.

14

Principal Compiler and Runtime Challenges

• Exploiting multiple levels of heterogeneous parallelism

• Choreographing parallelism, data movement, synchronization

• Managing memory hierarchy—cache

—scratch pad

Warning: Don’t try this at home.

Page 15: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.

15

Programming Model Ecosystem Issues

• Semantic mismatch between programming model and execution model

• Debugging: data races and non-determinism

• Performance analysis: why isn’t performance scaling—insufficient parallelism

—parallelism is too fine grain to be efficient

—architecture level issues, e.g., false sharing

Page 16: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.

16

A Path Forward

• Kernel, benchmark, and application driven studies—assess strengths and weaknesses of models

• Explore alternatives & evaluate their effects on —simplicity

—expressiveness

—correctness

—performance