Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming...
-
Upload
erin-kelly -
Category
Documents
-
view
212 -
download
0
Transcript of Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming...
![Page 1: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c317e8/html5/thumbnails/1.jpg)
Moderator:
John Mellor-Crummey
Department of Computer ScienceRice University
Programming Languages/Models and Compiler Technologies
Microsoft Manycore Workshop June 21, 2007
![Page 2: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c317e8/html5/thumbnails/2.jpg)
2
Panelists
• David August - Princeton University
• Saman Amarasinghe - Massachusetts Institute of Technology
• Guy Blelloch - Carnegie Mellon University
• Charles Leiserson - Massachusetts Institute of Technology
• Uzi Vishkin - University of Maryland, College Park
![Page 3: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c317e8/html5/thumbnails/3.jpg)
3
Architectural Challenges
• Significant parallelism
• Multiple kinds of parallelism—cores
—ILP
—SIMD
• Diversity of cores
• Run-time throttling of cores for power mgmt
• Memory hierarchy—bandwidth
– near term: will continue to be a significant bottleneck
– long term: 3D stacked memory?
—long and often non-uniform memory latencies
—scratch pads
![Page 4: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c317e8/html5/thumbnails/4.jpg)
4
Roles of Parallel Programming Models
• Enhance programmer productivity through abstraction
• Manage platform resources to deliver performance
• Provide standard interface for platform portability
![Page 5: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c317e8/html5/thumbnails/5.jpg)
5
The Goal
Simpler ways of conceptualizing, expressing, debugging, and tuning scalable parallel programs
• Multiple models will be necessary
• Models will necessarily trade off simplicity, expressivity, relevance to legacy code, and performance
![Page 6: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c317e8/html5/thumbnails/6.jpg)
6
To Succeed, Parallel Programming Models Must …
• Be ubiquitous—cross platform
—at a minimum: laptops, SMP servers
—distributed memory clusters?
• Be expressive
• Be productive—easy to write
—easy to read and maintain
—easy to reuse
• Have a promise of future availability and longevity
• Be efficient
• Be supported by tools
![Page 7: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c317e8/html5/thumbnails/7.jpg)
7
Simplifying Parallel Programming
A high-level parallel language should …
• Provide global address space —beware exposed buffering …
• Separate concerns: partitioning, mapping, and synchronization vs. algorithm specification—“viscosity” comes from premature mingling of these issues
• Enable programmer to manage locality at a high level—locality = performance
—affinity between data and computation
– e.g. HPF’s “ON HOME” declarations
![Page 8: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c317e8/html5/thumbnails/8.jpg)
8
Design Issues I
• Ultimate control vs. simplicity of use—“library developers” vs. “productivity users”
– should it be the same language for both?
extensible language model (Sun’s Fortress)
kitchen sink model (X10)
• Implicit vs. explicit parallelism—implicit parallelism is often more malleable —better supports dynamic adaptation
• Compiler assisted vs. compiler-centric—Co-array Fortran and UPC
– user control over work decomposition, data movement, and synchronization
—HPF: compiler must deliver or all is lost
• Lazy vs. eager parallelism—Cilk’s lazy parallelism provides a model for “scalable” binaries—eager parallelism adds unnecessary overhead
![Page 9: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c317e8/html5/thumbnails/9.jpg)
9
Design Issues II
• Deterministic vs. non-deterministic models—deterministic “clocked final model”
– Saraswat et al. (www.saraswat.org/cf.pdf)
• Static vs. dynamic scheduling—dynamic scheduling will be increasingly important
– irregular computations, task parallelism
– adaptive scheduling in response to “core throttling”
• Cooperative vs. independent scheduling of work—does benefit of shared cache outweigh difficulty of using it?
– tightly synchronous vs. more loosely synchronous
• Scalable to distributed-memory ensembles?—broad community probably only cares about tightly-coupled platforms—some government and industry clients will always have extreme needs
• Importance of managing affinity between cores and data—important for highest efficiency for library developers
![Page 10: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c317e8/html5/thumbnails/10.jpg)
10
Transactions are not “THE” Answer
• Transactions are a piece of the puzzle: atomicity
• Other aspects of the parallel programming problem—identifying concurrency
—partitioning work
—ordering actions
![Page 11: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c317e8/html5/thumbnails/11.jpg)
11
Autotuning
• Seductive idea
• Very successful as a library-based approach—FFTW, Atlas, OSKI, …
• Much work needed to apply to applications rather than kernels—huge search space
– progress in effective truncated search
—model guidance can be effective
—autotuning for parallelism
– dangerously close to automatic parallelization
![Page 12: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c317e8/html5/thumbnails/12.jpg)
12
Rice Experience: Lessons from HPF
• Good data and computation partitionings are essential—without good partitionings, parallelism suffers
—flexible user-control is essential
• Excess communication undermines scalability—both frequency and volume must be right
—embrace user hints to guide communication placement and optimization
– e.g. HPF/JA directives: REFLECT, LOCAL, PIPELINE, etc.
• Single processor efficiency is critical—must use caches effectively on microprocessors
—Icache: beware of complex machine-generated code
—Dcache: beware of communication footprint
• Optimizing tightly-coupled algorithms can be hard—if the compiler doesn’t optimize it, performance may be doomed!
![Page 13: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c317e8/html5/thumbnails/13.jpg)
13
Rice Experience: HPF vs. Co-array Fortran
• Rice dHPF - a decade of investment in compiler technology—not quite, govt cut funding here too, just like architecture
—polyhedral code generation models (like Lethin described)
• Co-array Fortran for clusters—a few years effort by a pair of students
• Result: Co-array Fortran bests HPF—more expressive
—higher performance
—shorter time to solution
—currently, can be HARDER to program than MPI
![Page 14: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c317e8/html5/thumbnails/14.jpg)
14
Principal Compiler and Runtime Challenges
• Exploiting multiple levels of heterogeneous parallelism
• Choreographing parallelism, data movement, synchronization
• Managing memory hierarchy—cache
—scratch pad
Warning: Don’t try this at home.
![Page 15: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c317e8/html5/thumbnails/15.jpg)
15
Programming Model Ecosystem Issues
• Semantic mismatch between programming model and execution model
• Debugging: data races and non-determinism
• Performance analysis: why isn’t performance scaling—insufficient parallelism
—parallelism is too fine grain to be efficient
—architecture level issues, e.g., false sharing
![Page 16: Moderator: John Mellor-Crummey Department of Computer Science Rice University Programming Languages/Models and Compiler Technologies Microsoft Manycore.](https://reader036.fdocuments.in/reader036/viewer/2022083005/56649f1b5503460f94c317e8/html5/thumbnails/16.jpg)
16
A Path Forward
• Kernel, benchmark, and application driven studies—assess strengths and weaknesses of models
• Explore alternatives & evaluate their effects on —simplicity
—expressiveness
—correctness
—performance