A Framework for Reasoning aboutInherent Parallelism in Modern
Object-Oriented Languages
Andrew CraikBASc Comp Eng Distinction
University of WaterlooJune 2007
A thesis submitted in partial fulfilment of the requirements for the degree ofDoctor of Philosophy
February 2011
Principal Supervisor: Dr. Wayne KellyAssociate Supervisor: Professor Paul Roe
*Discipline of Computer Science
Faculty of Science and TechnologyQueensland Univesity of TechnologyBrisbane, Queensland, AUSTRALIA
c© Copyright by Andrew Craik 2011. All Rights Reserved.
The author hereby grants permission to the Queensland University of Technology toreproduce and redistribute publicly paper and electronic copies of this thesis document in
whole or in part.
Keywords
programming languages; Ownership Types; parallelization; inherent parallelism; con-
ditional parallelism; effect system.
iii
iv
Abstract
With the emergence of multi-core processors into the mainstream, parallel programming
is no longer the specialized domain it once was. There is a growing need for systems
to allow programmers to more easily reason about data dependencies and inherent
parallelism in general purpose programs. Many of these programs are written in popular
imperative programming languages like Java and C].
In this thesis I present a system for reasoning about side-effects of evaluation in an
abstract and composable manner that is suitable for use by both programmers and
automated tools such as compilers. The goal of developing such a system is to both
facilitate the automatic exploitation of the inherent parallelism present in imperative
programs and to allow programmers to reason about dependencies which may be lim-
iting the parallelism available for exploitation in their applications. Previous work on
languages and type systems for parallel computing has tended to focus on providing
the programmer with tools to facilitate the manual parallelization of programs; pro-
grammers must decide when and where it is safe to employ parallelism without the
assistance of the compiler or other automated tools. None of the existing systems
combine abstraction and composition with parallelization and correctness checking to
produce a framework which helps both programmers and automated tools to reason
about inherent parallelism.
In this work I present a system for abstractly reasoning about side-effects and data
dependencies in modern, imperative, object-oriented languages using a type and effect
system based on ideas from Ownership Types. I have developed sufficient conditions
for the safe, automated detection and exploitation of a number task, data and loop
parallelism patterns in terms of ownership relationships.
v
To validate my work, I have applied my ideas to the C] version 3.0 language to produce
a language extension called Zal. I have implemented a compiler for the Zal language
as an extension of the GPC] research compiler as a proof of concept of my system.
I have used it to parallelize a number of real-world applications to demonstrate the
feasibility of my proposed approach. In addition to this empirical validation, I present
an argument for the correctness of the type system and language semantics I have
proposed as well as sketches of proofs for the correctness of the sufficient conditions for
parallelization proposed.
vi
Contents
1 Introduction 1
1.1 Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Explicit vs. Inherent Parallelism . . . . . . . . . . . . . . . . . . 3
1.1.2 Developing Parallel Programs . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Traditional Data Dependency Analysis . . . . . . . . . . . . . . . 7
1.2.1.1 Scalar and Local Dependency Analysis . . . . . . . . . 7
1.2.1.2 Pointer/Reference May-Alias Analysis . . . . . . . . . . 8
1.2.1.3 Array Data Dependency Analysis . . . . . . . . . . . . 8
1.2.1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Type Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 My Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.1 Object-orientation . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.2 Ownership Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 Reasoning About Parallelism 21
2.1 Side-effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.1 Abstracting Effects . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.1.2 Effect System Details . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1.3 Effect System Complications due to Object-Orientation . . . . . 27
2.2 Detecting Data Dependencies . . . . . . . . . . . . . . . . . . . . . . . . 28
vii
2.3 Task Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.1 Sufficient Conditions . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4 Loop Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4.1 Data Parallel Loops . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.1.1 Foreach Loops . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.1.2 Enhancing the Foreach Loop . . . . . . . . . . . . . . . 35
2.4.1.3 The Enhanced Foreach Loop . . . . . . . . . . . . . . . 36
2.4.1.4 Loop Rewriting . . . . . . . . . . . . . . . . . . . . . . 37
2.4.2 Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4.2.1 Foreach Loops . . . . . . . . . . . . . . . . . . . . . . . 38
2.4.2.2 Enhanced Foreach Loops . . . . . . . . . . . . . . . . . 41
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3 Realization of the Abstract System 43
3.1 Encapsulation Enforcement . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Ownership Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.1 Generics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2.2 Subtyping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.3 Type Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3 Side-effects Using Contexts . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3.1 Heap Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.2 Stack Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.4 Effect Disjointness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.4.1 Heap Effect Disjointness . . . . . . . . . . . . . . . . . . . . . . . 58
3.4.2 Facilitating Upwards Data Access . . . . . . . . . . . . . . . . . 59
3.4.3 Context Constraints . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.4.4 Runtime Ownership Tracking . . . . . . . . . . . . . . . . . . . . 62
3.4.5 Stack Effect Disjointness . . . . . . . . . . . . . . . . . . . . . . . 64
3.5 Realizing the Sufficient Conditions for Parallelism . . . . . . . . . . . . . 65
3.5.1 Task Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.5.2 Data Parallel Loops . . . . . . . . . . . . . . . . . . . . . . . . . 66
viii
3.5.3 Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4 Application to an Existing Language 69
4.1 Language Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2 Choice of Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.3 Syntactic Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.3.1 Basic Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.3.1.1 Class-Level Context Parameters . . . . . . . . . . . . . 75
4.3.1.2 Method-Level Context Parameters . . . . . . . . . . . . 76
4.3.1.3 Context Constraints . . . . . . . . . . . . . . . . . . . . 78
4.3.1.4 Method Effect Declarations . . . . . . . . . . . . . . . . 79
4.3.2 Subroutine Constructs . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3.2.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3.2.2 Indexers . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.3.2.3 Delegates . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.3.2.4 Events . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.3.2.5 Anonymous Methods . . . . . . . . . . . . . . . . . . . 89
4.3.2.6 Lambda Expressions . . . . . . . . . . . . . . . . . . . . 93
4.3.2.7 Extension Methods . . . . . . . . . . . . . . . . . . . . 95
4.3.3 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.3.3.1 Ref and Out Call Parameters . . . . . . . . . . . . . . . 96
4.3.3.2 Partial Types . . . . . . . . . . . . . . . . . . . . . . . . 97
4.3.3.3 Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.3.3.4 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.3.3.5 User Defined Value Types . . . . . . . . . . . . . . . . . 99
4.3.3.6 Static Classes . . . . . . . . . . . . . . . . . . . . . . . 101
4.3.3.7 Nullable Types . . . . . . . . . . . . . . . . . . . . . . . 102
4.3.3.8 Existing Types . . . . . . . . . . . . . . . . . . . . . . . 103
4.4 Statics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.4.1 Static Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
ix
4.4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.5 LINQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5 Formalization 111
5.1 Type System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.1.1 Type Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.1.2 Effect Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.2 Proof of Ancestor Tree Search Algorithm . . . . . . . . . . . . . . . . . 116
5.3 Proof of Condition Correctness . . . . . . . . . . . . . . . . . . . . . . . 119
5.3.1 Task Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.3.2 Data Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.3.3 Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6 Implementation 131
6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.2 Implementation Attribution . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.3 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.3.1 Scanner & Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.3.2 Abstract Syntax Tree . . . . . . . . . . . . . . . . . . . . . . . . 137
6.3.3 Design of the Pluggable Type System . . . . . . . . . . . . . . . 139
6.3.3.1 Generic Types . . . . . . . . . . . . . . . . . . . . . . . 140
6.3.3.2 Extracting an Abstract Type Parameter Infrastructure 141
6.3.4 Effect Calculation and Validation . . . . . . . . . . . . . . . . . . 146
6.3.4.1 Heap Effects . . . . . . . . . . . . . . . . . . . . . . . . 146
6.3.4.2 Stack Effects . . . . . . . . . . . . . . . . . . . . . . . . 148
6.3.4.3 Loop Body Rewriting . . . . . . . . . . . . . . . . . . . 151
6.4 Zal Code Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.4.1 Runtime Ownership Implementation and Tracking . . . . . . . . 155
6.4.1.1 Ownership Implementation . . . . . . . . . . . . . . . . 155
6.4.1.2 Properties & Indexers . . . . . . . . . . . . . . . . . . . 162
x
6.4.1.3 Sub-contexts . . . . . . . . . . . . . . . . . . . . . . . . 164
6.4.1.4 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
6.4.1.5 Statics . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
6.4.2 Enhanced Foreach Loop . . . . . . . . . . . . . . . . . . . . . . . 171
6.5 Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
6.5.1 Context Relationship Testing . . . . . . . . . . . . . . . . . . . . 174
6.5.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
6.5.2.1 Data Parallelism . . . . . . . . . . . . . . . . . . . . . . 178
6.5.2.2 Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . 179
6.5.2.3 Task Parallelism . . . . . . . . . . . . . . . . . . . . . . 181
6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
7 Validation 183
7.1 Test Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
7.2 The Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
7.2.1 Ray Tracer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
7.2.2 Calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
7.2.3 Bank Transaction System . . . . . . . . . . . . . . . . . . . . . . 188
7.2.4 Spectral Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
7.3 The Annotation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
7.4 The Results of Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . 194
7.4.1 Ray Tracer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
7.4.2 Calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
7.4.3 Bank Transaction System . . . . . . . . . . . . . . . . . . . . . . 197
7.4.4 Spectral Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
7.4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
7.5 The Results of Compilation . . . . . . . . . . . . . . . . . . . . . . . . . 199
7.5.1 Ray Tracer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
7.5.2 Calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
7.5.3 Bank Transaction System . . . . . . . . . . . . . . . . . . . . . . 200
7.5.4 Spectral Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
xi
7.6 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
7.6.1 Runtime Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 204
7.6.2 Memory Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 205
7.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
8 Comparison with Related Work 213
8.1 Background to Type Systems and Data Flow Analysis . . . . . . . . . . 214
8.2 Traditional Data Dependency Analysis . . . . . . . . . . . . . . . . . . . 217
8.2.1 Array Dependence Analysis . . . . . . . . . . . . . . . . . . . . . 217
8.2.2 May-Alias Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 218
8.3 Automatically Parallelizing Compilers . . . . . . . . . . . . . . . . . . . 219
8.4 Type and Effect Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
8.4.1 FX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
8.4.2 Ownership Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
8.4.2.1 Ownership Side-Effects . . . . . . . . . . . . . . . . . . 223
8.4.2.2 Applications to Parallelism . . . . . . . . . . . . . . . . 224
8.4.2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 225
8.4.3 Universe Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
8.4.3.1 Applications to Parallelism . . . . . . . . . . . . . . . . 226
8.4.4 Ownership Domains . . . . . . . . . . . . . . . . . . . . . . . . . 226
8.4.4.1 Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
8.4.5 Boxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
8.4.6 Uniqueness, Read-Only References and Immutability . . . . . . . 228
8.4.7 SafeJava . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
8.4.8 Deterministic Parallel Java . . . . . . . . . . . . . . . . . . . . . 229
8.5 Logics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
8.5.1 Hoare & Separation Logic . . . . . . . . . . . . . . . . . . . . . . 231
8.6 Programming Languages for Parallelism . . . . . . . . . . . . . . . . . . 232
8.6.1 Haskell & other Functional Languages . . . . . . . . . . . . . . . 232
8.6.2 Cyclone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
8.6.3 Scala . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
xii
8.6.4 High Productivity Computing Languages . . . . . . . . . . . . . 236
8.6.5 Spec# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
8.7 Alternative Concurrency Abstractions . . . . . . . . . . . . . . . . . . . 238
8.7.1 Futures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
8.7.1.1 Task Parallel Library & Parallel LINQ . . . . . . . . . 239
8.7.1.2 OpenMP . . . . . . . . . . . . . . . . . . . . . . . . . . 239
8.7.2 Message Passing . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
8.7.2.1 Actor Model . . . . . . . . . . . . . . . . . . . . . . . . 240
8.7.2.2 MPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
8.7.2.3 Jade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
8.8 Object-Oriented Paradigm Considerations . . . . . . . . . . . . . . . . . 242
8.8.1 Design Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
8.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
9 Conclusion & Future Work 245
9.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
9.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
9.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
9.3.1 Memory Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 248
9.3.2 Annotation Overhead . . . . . . . . . . . . . . . . . . . . . . . . 248
9.3.2.1 Ownership Inference . . . . . . . . . . . . . . . . . . . . 248
9.3.2.2 Ownership Transfer . . . . . . . . . . . . . . . . . . . . 248
9.3.2.3 Temporary Owners for Transient Objects . . . . . . . . 249
9.3.3 Loop Parallelism Limitations . . . . . . . . . . . . . . . . . . . . 250
9.3.3.1 Improved Handling of Collections . . . . . . . . . . . . 250
9.3.3.2 Light’s Associativity Test . . . . . . . . . . . . . . . . . 250
9.3.4 Language Limitations . . . . . . . . . . . . . . . . . . . . . . . . 252
9.3.4.1 Liberalization of the Stack Model . . . . . . . . . . . . 252
9.3.4.2 Handling Unsafe Code Blocks . . . . . . . . . . . . . . 252
9.3.4.3 Multiple Ownerships to Model Communication Channels253
9.4 Summation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
xiii
Bibliography 255
xiv
List of Figures
1.1 A diagram showing the relationships between algorithms, programs, andparallelism. The process of parallelization is the process of making im-plicit parallelism explicit so that it can be exploited. . . . . . . . . . . . 4
1.2 Abstract memory regions which can be named as effects. The areas ofoverlap between regions indicate possible data dependencies. . . . . . . . 13
1.3 Memory regions based on an object-oriented program’s representationhierarchy. The white circles are objects and the triangles are the areasof memory which form part of their representation. The inclusion of onetriangle within another indicates that a data dependency could exist ifthey were named as effects on different operations. . . . . . . . . . . . . 15
2.1 Memory regions based on an object-oriented program’s representationhierarchy. The white circles are objects and the triangles are the areasof memory which form part of their representations. The inclusion of onetriangle within another is the basis of effect abstraction and dependencydetection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2 The overlapping of the execution of loop iterations by using S A throughS D as pipeline stages. Notice how the iterations ripple through thedifferent stages as time progresses. . . . . . . . . . . . . . . . . . . . . . 39
2.3 Graph to help visualize dependencies permitted between specific exe-cutions of pipeline stages. S A through S D represent pipeline stagesordered from left to right and loop iterations are arranged from top tobottom. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1 ownership relationships between contexts at runtime used as an exampleof capturing context disjointness using sub-contexts. . . . . . . . . . . . 59
4.1 The different components that are used to compile a Zal program andexploit its inherent parallelism at runtime. . . . . . . . . . . . . . . . . . 70
xv
5.1 The structure of the proof of sufficient conditions for parallelism correct-ness presented in Chapter 5. The proofs of the items highlighted in redare cited in the literature rather than re-derived. . . . . . . . . . . . . . 112
5.2 The helper function, ancestors used to test for context disjointness. Notethat Γ represents the type checking environment and Γ(l1) obtains theparent of context l1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.3 The base case for the ancestor algorithm inductive proof: a and b arethe same node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.4 The induction step of the ancestor algorithm proof: b is a parent of thenth parent of a. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.5 The relationship between contexts c1, c2, and object x. . . . . . . . . . . 124
5.6 The relationships between e1, e2, and x as used in the proof of effectdisjointness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.7 The relationships of e1, e2, r, and x and the disjointness of k1 and r forthe proof of effect disjointness. . . . . . . . . . . . . . . . . . . . . . . . 127
6.1 A diagram showing the different parts of the compiler we have written.The Zal only operations are shown in white boxes; these steps are skippedby the normal C] compiler. . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.2 An illustration of the Zal compiler source directory structure where classimplementation is split across source files using partial classes stored insubdirectories for each stage of compilation. . . . . . . . . . . . . . . . . 138
6.3 This figure shows the AST subtree generated by our C] compiler for theclass declaration shown in Listing 6.5. . . . . . . . . . . . . . . . . . . . 142
6.4 This figure shows the AST subtree generated by our C] compiler forthe class declaration shown in Listing 6.5 after the amalgamated typeparameter wrappers have been added. . . . . . . . . . . . . . . . . . . . 145
6.5 The effect rule for a statement block. . . . . . . . . . . . . . . . . . . . . 148
7.1 The scene rendered by the ray tracer from Microsoft’s Samples for Par-allel Computing with the .NET Framework 4; note the reflective surfaceswhich increase the rendering complexity. . . . . . . . . . . . . . . . . . . 186
7.2 Speed-up graph for the ray tracer example. . . . . . . . . . . . . . . . . 207
7.3 Speed-up graph for the calculator example. . . . . . . . . . . . . . . . . 207
7.4 Speed-up graph for the bank transaction processing system. . . . . . . . 208
7.5 Speed-up graph for the spectral methods example. . . . . . . . . . . . . 208
xvi
7.6 Graph showing the overhead of the two possible runtime ownership track-ing systems when applied to the ray tracer example. . . . . . . . . . . . 209
7.7 Graph showing the speedup in the calculator example when the runtimeownership tracking systems are enabled. . . . . . . . . . . . . . . . . . . 209
7.8 Graph showing the overhead of the two possible runtime ownership track-ing systems when applied to the simplified bank transaction system. . . 210
7.9 Graph showing the overhead of the two possible runtime ownership track-ing systems when applied to the spectral methods example. . . . . . . . 210
7.10 Graph showing the memory overhead of the pointer and Dijkstra Viewsbased runtime ownership tracking systems. The O(n) memory usage ofthe Dijkstra Views implementation can be clearly seen as the number ofnodes increases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
7.11 Graph showing the memory overhead of the pointer and Dijkstra Viewsbased runtime ownership tracking systems. Note that the height of theownership tree does not increase with the data size in this example andso the Dijkstra Views memory consumption grows proportional to thenumber of transactions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
9.1 Illustration of two singly owned objects a and b communicating via ajointly owned context, labelled a&b. . . . . . . . . . . . . . . . . . . . . 253
xvii
xviii
List of Tables
3.1 Table showing the runtime complexity of object creation and relationshiptesting. Note that n is the height of the ownership tree. . . . . . . . . . 64
4.1 The four context relationships which can be stipulated in a Zal contextconstraint clause and their meanings. . . . . . . . . . . . . . . . . . . . . 78
6.1 Measures of the relative sizes of the Ownership Extensions and the GPC]compiler in terms of physical lines of code (SLOC-P), logical lines of code(SLOC-L), and cyclomatic complexity (McCabe VG) [76]. . . . . . . . . 133
6.2 The AST nodes added to represent the contexts, context constraints,effect declarations, and enhanced foreach loops. . . . . . . . . . . . . . 139
6.3 Descriptions of the AST node types which appear in Figure 6.3. . . . . . 141
6.4 The interfaces written to provide an abstract structure for the imple-mentation of type parameters. . . . . . . . . . . . . . . . . . . . . . . . . 143
6.5 The classes used to wrap up lists of type parameters so that differentparameter lists do not need to be aware of one another and so thatparameters can be checked and resolved collectively. . . . . . . . . . . . 144
6.6 Key methods from the interfaces used to abstract type parameters tocreate a pluggable type system. . . . . . . . . . . . . . . . . . . . . . . . 147
6.7 Table of custom attributes used to store declared context parametersand effect information. These attributes are emitted into C] source codeproduced by the Zal compiler. . . . . . . . . . . . . . . . . . . . . . . . . 160
6.8 Enhanced foreach loop body delegates based on the optional enhance-ments declared in the loop header. *note that the loop body withouteither of the optional enhancements is a traditional foreach loop and canbe handled using the IEnumerable interface as usual. . . . . . . . . . . 172
7.1 Table showing the number of logical lines of source code modified foreach of the examples during the annotation process. . . . . . . . . . . . 195
xix
7.2 Table showing the number method definitions modified for each of theexamples during the annotation process. . . . . . . . . . . . . . . . . . . 195
xx
Code Listings
1.1 A small program snippet used to illustrate the idea of implicit paral-lelism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 An example of a pair of nested loops used in the discussion of traditionalarray data dependency analysis. . . . . . . . . . . . . . . . . . . . . . . 9
1.3 A code snippet showing the Java 1.1.1 getSigners bug. . . . . . . . . 16
2.1 A simple stereotypical data parallel foreach loop. . . . . . . . . . . . . 32
2.2 A generic foreach loop. . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3 A generic foreach loop with its body abstracted to a method on theelement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4 The body of a generic foreach loop extracted as a method on the elementtype T. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.5 A for loop whose body updates elements of a collection in place andmakes use of the index of elements in the collection. . . . . . . . . . . . 35
2.6 An example of the syntax for an enhanced foreach loop equivalent tothe for loop in Listing 2.5. . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.7 A simple enhanced foreach loop used to develop sufficient conditionsfor parallelism which can be generalized to all enhanced foreach loops. 36
2.8 A generic foreach loop body consisting of four statements. . . . . . . . 38
3.1 A code snippet showing the field and method signature implicated in theJava 1.1.1 getSigners bug. . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 A simple stack implementation showing how I annotate classes with con-text parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3 An example of a class with both generic type parameters and contextparameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4 A simple generic stack implementation showing how I annotate classeswith context parameters and how they interact with generic type param-eters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.5 An example showing how a child class maps its formal context parame-ters to those of its parent. . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.6 An example showing how allowing variance in type parameters can createholes in the type system. . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.7 Algorithm for unioning two sets of effects, set1 and set2. Note that the+ and − operations are just set addition and subtraction. . . . . . . . 55
xxi
3.8 A simple stack implementation illustrating method and constructor ef-fect declarations. Method-level context parameters are specified using anotation inspired by C]’s method-level generic type parameters. . . . . 56
3.9 An example of context constraint syntax on a class with context param-eters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.10 The algorithm for testing if two contexts are disjoint. Each object hasa list of ancestor contexts which can be indexed into using []. The ||operator has its usual mathematical meaning of magnitude. . . . . . . 63
3.11 A simple stereotypical data parallel foreach loop. . . . . . . . . . . . . 66
4.1 An example of a hashtable which implements a visitor interface whichallows the values to be traversed in parallel provided the k and v contextsare disjoint. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2 The C] implementation of the hashtable example shown in Listing 4.1. 71
4.3 An example of a class parameterized with formal context parametersowner, a, and b. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.4 An example of a field with actual context parameters this, world, and b. 76
4.5 An example of a class extending a class which is parameterized withcontext parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.6 A method definition with formal context parameters. . . . . . . . . . . 77
4.7 An example of the inference of type parameters to a generic method. . 77
4.8 An example of a class annotated with context parameters and contextconstraints using Zal’s syntax. . . . . . . . . . . . . . . . . . . . . . . . 79
4.9 An example of an instance method annotated with effect declarationsand context constraint clauses. . . . . . . . . . . . . . . . . . . . . . . . 80
4.10 An example of a property being used to read and write a field. . . . . . 81
4.11 Code snippet which shows how the properties in Listing 4.10 could beimplemented using methods. . . . . . . . . . . . . . . . . . . . . . . . . 82
4.12 An example of a property annotated with read and write effects. . . . . 83
4.13 An example of an automatic property which does not require an explicitfield or accessor implementations. . . . . . . . . . . . . . . . . . . . . . 83
4.14 An example of an indexer used to convert day names into numerical daysof the week from a defined starting point. . . . . . . . . . . . . . . . . . 84
4.15 An example of how to annotate an indexer with context parameters andeffects. The classes annotated were previously shown in Listing 4.14. . 85
4.16 The syntax of a delegate taking two Objects and returning an Object. 86
4.17 An example of the use of context parameters and effect declarations onthe delegate originally shown in Listing 4.16. . . . . . . . . . . . . . . . 86
4.18 An example of a simple C] event using the delegate type EventHandler. 88
4.19 An example of a simple C] event previously shown in Listing 4.18 nowannotated with context parameters and effect declarations. . . . . . . . 88
xxii
4.20 An example of a simple Reverse Polish Notation calculator which definesbinary options to be applied to the stack as a delegate. The calculatorsupplies two operations, add and sub, via anonymous method declara-tions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.21 An example of an ownership annotated Reverse Polish Notation calcu-lator class based on the original C] example shown in Listing 4.20. Notethe effect declarations added to the anonymous methods. . . . . . . . . 90
4.22 A code listing showing the capture of local variables from two differentscopes in the anonymous method returned from the operation method. 91
4.23 An example showing how the C] compiler would implement the exampleshown in Listing 4.22 using private inner classes. . . . . . . . . . . . . . 91
4.24 An example showing how the implementation of the example shown inListing 4.22 would be annotated with context parameters and effect dec-larations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.25 The outer variable capture example from Listing 4.24 annotated withcontext parameters and effect declarations. . . . . . . . . . . . . . . . . 93
4.26 An example code snippet showing the difference in the typing of anony-mous methods and lambda expressions. The anonymous method fails tocompile because the int i parameter cannot be implicitly converted toa short while the lambda fails to compile because of the delegate it isbound to taking an int parameter; it would succeed if the delegate tooka short as a parameter. . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.27 An example of an ownership annotated Reverse Polish Notation calcu-lator class based on the original C] example shown in Listing 4.20 withthe anonymous methods replaced by lambda expressions. . . . . . . . . 95
4.28 An example of an extension method which adds a WordCount method tothe interface of string. . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.29 An example of an extension method annotated with context parame-ters and effect declarations showing reads of the extension parametergenerating reads of this. . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.30 Examples of single, jagged, and multi-dimensional arrays and their an-notation with context parameters. . . . . . . . . . . . . . . . . . . . . . 99
4.31 A code fragment showing the copy on assignment behavior of a user-defined value type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.32 An example of a struct in the form of a two dimensional coordinatein a coordinate system. Note the struct holding a reference to theCoordinateSystem class. . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.33 The Point value type shown in Listing 4.32 annotated with contextparameters and effect declarations. . . . . . . . . . . . . . . . . . . . . 102
4.34 An example of the syntax for declaring an effectTemplate to injectcontext and effect information onto an existing type. . . . . . . . . . . 103
xxiii
4.35 An example of the syntax for declaring an effectTemplate which doesnot add formal context parameters to a type, but still injects contextand effect information onto member declarations it contains. . . . . . . 104
4.36 An example of an effectTemplate which adds effect declarations tosome of the methods of the ICollection<T>. . . . . . . . . . . . . . . 105
4.37 An example of a read of a static field of the DataStore class causing aread of the DataStore type context. . . . . . . . . . . . . . . . . . . . . 106
4.38 The read of Child.value actually reads the value field declared on theParent class and so results in a read of the Parent type context. . . . . 106
4.39 An example of a generic type parameter being used in the declaration ofa static. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.40 An example of a static field whose type is constructed using class contextparameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.41 An example of a simple LINQ query . . . . . . . . . . . . . . . . . . . . 108
4.42 The reduction of the simple LINQ query example shown in Listing 4.41to a series of chained method invocations. . . . . . . . . . . . . . . . . 109
5.1 The simple stereotypical data parallel foreach loop for which sufficientconditions for parallelization were developed. . . . . . . . . . . . . . . . 124
5.2 An example of the style of loop intended for pipelining . . . . . . . . . 128
6.1 COCO/R grammar production for a list of actual context parameters inZal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.2 Coco/R grammar production for a list of formal context parameters inZal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.3 Coco/R grammar production for a set of declared read and write effects. 137
6.4 The modified Coco/R production for the enhanced foreach loop in Zal. 138
6.5 An example of the declaration and use of generic types in C]. . . . . . 140
6.6 The key interfaces used to abstract the different kinds of type parameterlists and the methods declared on them. . . . . . . . . . . . . . . . . . 147
6.7 The method for computing the heap effect of a statement block. . . . . 148
6.8 The computEffects method of the NameExpression AST node. . . . . 149
6.9 The implementations of the local effect computation methods LocalEffectson member and name expressions as well as blocks statements. . . . . . 150
6.10 An example of the mapping of struct this contexts to stack variablesduring stack effect computation . . . . . . . . . . . . . . . . . . . . . . 151
6.11 The IOwnership interface implemented by all types emitted by the Zalcompiler when Dijkstra Views based tracking is selected. . . . . . . . . 157
6.12 The IOwnership interface implemented by all types emitted by the Zalcompiler when Dijkstra Views based tracking is selected. . . . . . . . . 157
6.13 Extension methods on object which allow ownership properties to beread from any object and set on any object which supports it as im-plemented for the parent pointer version of the IOwnership interface.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
xxiv
6.14 The items in the OwnershipHelpers class which are used to facilitatethe implementation of ownership tracking. . . . . . . . . . . . . . . . . 160
6.15 The implementation of a Zal class, with context formal context parame-ters and declared constructor effects, in C] using custom attributes andthe OwnershipHelpers library of helper methods. . . . . . . . . . . . . 161
6.16 The implementation of a Zal method with declared formal context pa-rameters and effects, in C] using custom attributes and OwnershipHelpers.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
6.17 An example of the implementation of a Zal property in C]. . . . . . . . 163
6.18 The implementation of the SubContext class which is used to representsub-contexts declared in type definitions. . . . . . . . . . . . . . . . . . 164
6.19 An example of the use of sub-contexts as part of the implementation ofa binary tree node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
6.20 The static ArrayOwnershipExtensions class which stores ownership in-formation for arrays and provides access to this information via extensionmethods on the System.Array which provide the same functionality asrequired of types which implement the IOwnership interface as imple-mented for the parent pointer based system. . . . . . . . . . . . . . . . 166
6.21 An example of the creation of an array object in Zal and how the own-ership of the array is set. Notice that the standard AddChild method iscalled to set the ownership of the array. The add child method calls theObject SetParents method which in turn calls the SetParents exten-sion method on the System.Array type. . . . . . . . . . . . . . . . . . 167
6.22 The implementation of the ContextMap data structure used to implementstatic fields in classes parameterized by context parameters. . . . . . . 169
6.23 An example of the implementation of a Zal static field in C]. Noticethat because the static field could be accessed from outside the class theplain field is retained for backwards compatibility, but that getter andsetter methods are supplied for context aware code. . . . . . . . . . . . 170
6.24 An example of how a static method is implemented in C] when thecontaining type has formal context parameters. . . . . . . . . . . . . . 170
6.25 An example of the implementation of a Zal static property in C]. Theoriginal property can be optionally retained for use by existing C] pro-grams, but is omitted for clarity from the listing above. The get and setmethods are used by ownership aware code to marshall context param-eters to the accessor implementations. . . . . . . . . . . . . . . . . . . . 171
6.26 Sample implementations of EnhancedLoop for the IList and IDictionary
interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.27 The two interfaces supplied with the enhanced foreach loop librarywhich collections can implement so they can be used with the enhancedforeach loop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
xxv
6.28 Sample implementations of the EnhancedLoop method for the librarysupplied IEnhancedEnumerable and the ParallelEnhancedLoop methodfor IIndexedEnumerable. The ParallelEnhancedLoop makes use of theMicrosoft Task Parallel Library (TPL) parallel foreach loop implemen-tation (see Section 6.5.2.1). . . . . . . . . . . . . . . . . . . . . . . . . . 175
6.29 The ConstraintList interface showing context relationship addition,testing, and runtime test generation methods. . . . . . . . . . . . . . . 176
6.30 The extension methods in the runtime ownership library used to test therelationships between arbitrary contexts. . . . . . . . . . . . . . . . . . 177
6.31 foreach loop parallelization using the TPL’s Parallel.ForEach method.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
6.32 An example of how a parallel enhanced foreach loop would be imple-mented. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
6.33 Conditional foreach loop parallelization using the TPL’s Parallel.ForEachmethod. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
6.34 An example of using the pipelining library to create a pipeline; eachstage only modifies the ImageSlice and the representation of the filteror detector in that stage, if any. . . . . . . . . . . . . . . . . . . . . . . 180
6.35 The implementation of task parallelism using a TPL Task. . . . . . . . 181
7.1 The original C] Render method with its doubly-nested for loop. . . . . 186
7.2 The C] implementations of the calculator AST Node class and BinaryOperator
class. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
7.3 The original C] implementation of the bank transaction system’s Transactionand the Bank’s transaction processing method. . . . . . . . . . . . . . . 189
7.4 A fragment of the sequential C] Spectral Method’s key data structuresand computational methods. . . . . . . . . . . . . . . . . . . . . . . . . 190
7.5 The original sample code used as a running example to show the opera-tion of my proposed ownership annotation algorithm heuristic. . . . . . 191
7.6 The first step of annotating the Result class, adding the owner. . . . . 192
7.7 The completion of the Result class’s annotation. . . . . . . . . . . . . 192
7.8 The first step of annotating the ResultWrapper class. . . . . . . . . . . 193
7.9 The final result of annotating the code shown in Listing 7.5. . . . . . . 193
7.10 The Zal implementation of the original Render method shown in List-ing 7.1; note the loop has been rewritten as an enhanced foreach loop.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
7.11 The Zal implementation of the calculator AST Node class and BinaryOperator
class. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
7.12 The Zal implementation of the Bank’s transaction processing method. . 198
7.13 The annotated version of the Spectral Methods example. . . . . . . . . 199
7.14 The C] implementation of the Zal Render method shown in Listing 7.1. 200
7.15 The implementation of part of the Zal calculator example in C]; notethe task parallelism in the Compute method. . . . . . . . . . . . . . . . 201
xxvi
7.16 The implementation of the Zal transaction processing method in C]; notethe pipelined foreach loop. . . . . . . . . . . . . . . . . . . . . . . . . 202
7.17 The compiler output for the Zal implementation of the Spectral Methodsexample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
xxvii
xxviii
Statement of Original Authorship
The work contained in this thesis has not been previously submitted for a
degree or diploma at any other higher education institution. To the best of
my knowledge and belief, the thesis contains no material previously published
or written by another person except where due reference is made.
Signature:Andrew Craik
Date:
xxix
xxx
Acknowledgements
A PhD and the thesis written to obtain it is often viewed as an individual achieve-
ment; however, this achievement would not be possible without a host of supporting
personalities and I take this opportunity to thank them all.
Firstly, I would like to thank my parents for their unconditional love and support
and for engendering in me a love of science and learning which drove me to pursue
my education and question why and how the world works. Thank you also to my
grandparents. While not all of them lived to see me complete this work their love
and support across the miles all these years has always been a great encouragement.
Lastly, I would like to thank my lovely Rachel who only joined me part way through
my PhD journey, but who put up with my unsociable lab hours and crazed ramblings.
My candidacy would not have been half as fun, memorable, or enjoyable without her.
No student new to the art of research can hope to learn the craft without a skilled
mentor and I must thank my principal supervisor Dr. Wayne Kelly enormously for his
skilled guidance during my candidacy. I would also like to thank Prof. John Gough,
who together with Dr. Wayne Kelly, helped to write the C] compiler I modified and
extended as part of my thesis. This work would not have been possible without their
assistance. Lastly, I would like to thank Prof. Paul Roe, my associate supervisor, for
our many and varied discussions.
During September 2009 I had the pleasure of visiting the Victoria University of Welling-
ton to present my research and to obtain advice on some of the formalism required to
round-out the presentation of my ideas. I must thank all the members of VUW with
whom I interacted for a most productive and thoroughly enjoyable stay. I would espe-
cially like to single out and thank Dr. Alex Potanin, Prof. James Noble, Dr. David
xxxi
Pearce, and Dr. Nicholas Cameron for making me so welcome and taking the time to
share their knowledge and expertise with me.
I would also like to thank my lab mates for their comradeship and encouragement as
we struggled through our respective research journeys. While many people have passed
through the lab during my candidature I would especially like to thank Richard Mason,
Darryl Cain, Jiro Sumitomo, Matthew Brecknell, Lawrence Buckingham, Peter Ansell,
Wayne Reid, and Donna Teague.
Lastly, I would like to thank all of my friends at St John Ambulance QLD. My in-
volvement with First Aid Services has provided a wonderful balance to my life during
my candidature; I could not have been made more welcome. I would especially like
to thank the leaders, past and present, of Brisbane Central No. 2 Division includ-
ing Chris, Damien, Doris, and Clayton for creating such a wonderful environment and
understanding when my candidature had to come first.
xxxii
Chapter 1
Introduction
“Writing correct and efficient parallel programs is a major challenge that
calls for better tools and more abstract programming models to make thread
programming safer and more convenient” — Sodan et al. [109]
Parallel computing has long been a field of active research in computer science. Since
the earliest days of modern computing in World War II at Blechley Park, multiple
processing units have been used to concurrently execute parts of a program.
Traditionally, parallel computing was primarily of interest to those working in highly
specialized domains, such as the field of high performance computing, where the em-
phasis is on maximizing the throughput of carefully written computationally bound
numerical applications. Apart from these specialized programs, the vast majority of
general purpose programs were written and executed in a sequential manner; most com-
puters had only one processor and so there was no benefit to parallelizing programs.
Each new generation of silicon manufacturing technology shrinks the size of a transis-
tor which allows more transistors to be packed onto a chip of the same size. This fact
is usually stated in the form of Moore’s Law which states that the number of tran-
sistors that can be placed on an integrated circuit chip doubles approximately every
two years[87]. These additional transistors were, traditionally, used by chip designers
and manufacturers to increase the instruction throughput of the single computational
core found on most chips. This meant that with every new hardware generation, all
applications received a free performance boost.
1
2 Chapter 1. Introduction
These single core performance improvements were achieved through a combination of
higher clock speeds, larger caches, and the implementation of hardware optimizations
[116]. By the early 2000s, the limits of these optimization techniques were being reached
due to a number of physical issues: chip heat output began to outstrip the ability to
dissipate it, chip power demand began to exceed what could be easily supplied, and
increased leakage current and other parasitics reduced efficiency [116]. The net result of
this is that computational cores are no longer getting faster. To continue increasing the
instruction throughput of chips, manufacturers began to use increases in the number
of transistors per chip to implement additional computational cores on the same chip.
For applications to fully exploit additional computational cores there will need to be
changes in the design and implementation of software. Computer corse are now growing
in number rather than in speed as has previously been the case. Performance gains can
only be realized on newer hardware when the program to be executed can be broken
into chunks for execution on multiple cores concurrently.
The historical focus on sequential program execution has produced tools and program-
ming practices which are not easily amenable to parallelization. The race is now on
to find ways to make parallelism accessible to all programmers for everyday general
purpose computing despite this sequential legacy.
1.1 Parallelism
There is a clearly articulated need to allow more programmers, who do not have spe-
cialist parallel computing training, to write programs in mainstream programming lan-
guages, like Java and C], which are able to exploit the latest multi-core processors.
Unfortunately, mainstream programming languages, like Java and C], default to using
sequential semantics and so programmers tend to employ sequential problem solving
tactics and write sequential programs.
The amount of parallelism in programs written using these mainstream languages is
primarily dictated by the algorithms they employ. Algorithms can be broadly classified
into two groups: explicitly parallel and sequential. Sequential algorithms often still
contain some exploitable parallelism in the form of inherent parallelism.
Chapter 1. Introduction 3
1.1.1 Explicit vs. Inherent Parallelism
An explicitly parallel program or algorithm is one that employs parallel constructs
to explicitly specify which parts of the code should be executed concurrently. There
are many parallel programming languages and APIs that allow such parallelism to
be expressed. An implicitly parallel program or algorithm, by contrast, is one where
parallelism can be discovered through program analysis. Through analysis, the parts
of an inherently parallel program that could be executed in parallel without changing
the semantics of the program can be discovered.
1.1.2 Developing Parallel Programs
There are two broad classifications of the approaches to developing parallel programs.
In the some cases, it is natural to conceptualize the problem being solved in an explicitly
parallel form. In others, it is more convenient to first develop a working sequential
program. Once the sequential version of the program is fully tested and debugged, it
is converted into a parallel form. The process of transforming a working sequential
program into an equivalent parallel program is referred to as parallelization.
When parallelizing a program it is necessary to ensure that the semantics of the original
are not changed during the transformation — the parallel program should still produce
exactly the same results as the sequential program, it should just do so faster. One way
of ensuring that a transformation preserves the semantics of the original program it to
ensure that Bernstein’s Conditions for parallelism are satisfied. Bernstein’s Conditions
[15], as originally defined, states that two blocks of code, S1 and S2, can be safely
executed in parallel provided that:
• IN(S1) ∩OUT (S2) = ∅
• OUT (S1) ∩ IN(S2) = ∅
• OUT (S1) ∩OUT (S2) = ∅
where IN(S) is the set of memory locations used by S and OUT (S) is the set of memory
locations written to by S[15]. In this thesis I will rely on Bernstein’s Conditions to prove
4 Chapter 1. Introduction
that the parallelization performed by my system preserves the semantics of the original
program.
1.1.3 Parallelization
Figure 1.1 shows the relationships between sequential and parallel algorithms and im-
plementations. When parallelizing an application, the algorithms used in some parts of
the program may need to be replaced with alternative algorithms that produce the same
net result while being more amenable to parallelization. Such algorithm transforma-
tion cannot generally be performed automatically using tools; algorithm replacement
requires a detailed understanding of program semantics which requires human intellect
and analysis.
Figure 1.1: A diagram showing the relationships between algorithms, programs, andparallelism. The process of parallelization is the process of making implicit parallelismexplicit so that it can be exploited.
For other parts of the program, the algorithms employed may not need to be funda-
mentally changed. Instead, inherent parallelism contained in the original algorithm can
be exploited. Consider the trivial program fragment shown in Listing 1.1.
In this fragment the assignment to z requires the values of x and y to be set. Steps
which set the values of x and y do not depend on one another and so can be executed
in parallel provided that the assignments are completed before computing the value for
Chapter 1. Introduction 5
int x = 1;int y = 2;int z = x + y;
Listing 1.1: A small program snippet used to illustrate the idea of implicit parallelism.
z. This parallelism is called inherent parallelism and exists because the dependencies
between the steps of a sequential algorithm do not necessarily require all previous steps
to have been completed, just those whose results are consumed. This thesis addresses
the analysis required to accurately identify such inherent parallelism.
There is a number of different techniques which can be used to detect and safely exploit
inherent parallelism in sequential programs. Speculative execution is a major field of
research competing with program analysis techniques in the race to facilitate safe paral-
lelization of applications. In these speculative approaches, different parts of a program
are speculatively executed in parallel without static guarantees about the dependencies
between them using a transactional memory system. If conflicting data accesses occur,
the runtime memory transaction system is expected to return the program to a consis-
tent state before resuming the program’s execution. These speculative approaches have
the benefit of not requiring a detailed analysis before the benefits of parallelization can
be obtained, but this comes at the cost of the complex and resource intensive runtime
needed to ensure the safety and correctness of the program being executed.
Today, programmers can manually parallelize programs using tools like OpenMP [94]
and the Microsoft Task Parallel Library (TPL) [80] to express parallelism. If the se-
quential program is not fully optimized prior to parallelization, this manual process
may yield additional performance gains. However, for well written programs, man-
ual parallelization is a time consuming and error-prone process. It is often hard for
programmers to be certain that such parallelization is safe — that the transformed
program will produce the same results as the original program. As I shall explain
in Section 1.2.1, completely automated parallelization is beyond the current state-of-
the-art. This thesis, as previously stated, aims to develop techniques which can be
used to both identify opportunities for parallelization as well as validating the safety
of programmer supplied parallelization.
I feel the key to successfully exploiting inherent parallelism is facilitating reasoning
6 Chapter 1. Introduction
about parallelism by both programmers and automated tools. Sequential programs
contain some inherent parallelism which my techniques aim to exploit. Through the
use of the techniques I present in this thesis, finding the opportunities may be made
easier and opportunities may be discovered. The goal of this thesis is to combine
abstraction and composition with parallelization and correctness checking to produce
a framework which helps both programmers and automated tools to reason about
inherent parallelism; this is not done by existing systems.
1.2 Background
There are a number of different programming paradigms; each has its own strengths
and weaknesses. Different paradigms have different core principles and each has its
own classes of problems for which it is best suited. All programming paradigms make
use of a computer’s ability to store intermediate computation results and recall them
for use later in a program’s execution. Where paradigms differ is how this ability is
exposed to the programmer. In the declarative paradigm (which includes functional
and logic programming languages) this ability is not directly exposed — it is used
implicitly through each language’s semantics. The imperative paradigm, by contrast,
does directly expose this ability.
The choice of exposing a computer’s ability to store intermediate results to programmers
or not has a significant impact on the nature of a paradigm and the reasoning required
to parallelize programs written in its member languages. In those which hide this abil-
ity, like functional programming, the side-effects of executing a code block are highly
constrained; access to arbitrary shared state is prohibited and side-effects tightly con-
trolled through the type system. These constraints make reasoning about parallelism
easier. Code blocks in languages exposing access to arbitrary shared state can have
almost arbitrary side-effects which makes reasoning about possible data dependencies,
and hence parallelism, much more difficult.
The imperative paradigm is one of the most popular paradigms in use today. A large
number of heavily used commercial languages are generally classified as being part of
the imperative programming paradigm including Java, C], and Visual Basic. I have
Chapter 1. Introduction 7
chosen to focus this thesis on how to reason about parallelism in imperative languages
for several reasons. Firstly, imperative languages are some of the most popular and
widely used languages in the world today making parallelization of programs in these
languages especially urgent and important. Secondly, reasoning about parallelism in
other paradigms that restrict access to shared mutable state is relatively easier and
so developing techniques for imperative languages is a more interesting research goal.
Finally, a significant amount of knowledge, experience, and skill has been developed
working with imperative languages and it would be desirable to be able to retain some
or all of this while facilitating the use of parallelism.
1.2.1 Traditional Data Dependency Analysis
There are two broad categories of dependencies in programs: control dependencies
and data dependencies. Control dependencies are well studied and understood; mod-
ern programming languages have a number of structured control flow abstractions and
constraints which make reasoning about control dependencies relatively easier. Data
dependencies have also been extensively studied, but have traditionally focused on sci-
entific applications. These well studied scientific applications tend to take the form of
several tightly nested loops traversing array data structures performing complex math-
ematical computations. In this section I will present the traditional data dependency
analysis techniques with a view to identifying a gap in the current techniques and
knowledge that I aim to address in this thesis.
1.2.1.1 Scalar and Local Dependency Analysis
The traditional approach to performing a data dependency analysis is to compare,
pairwise, all of the statements in the code fragment being analyzed to determine the
nature of the dependency between the two statements, if any. The data dependency
analysis itself operates on the level of individual variables. These analysis techniques
operate on value types checking for variable name equality to detect if a dependence
can exist or not. When reference types are encountered a may-alias analysis [6, 111]
needs to be performed to determine which variables might actually be referring to the
same object at runtime.
8 Chapter 1. Introduction
Methods cause significant problems in these techniques because of the pairwise consid-
eration of statements. Methods abstract sequences of operations in the program and
so to compute dependencies between method invocations and other statements, there
needs to be some means of resolving what operations the method performs. In modern
imperative languages which permit overriding and dynamic binding, it may not be pos-
sible to statically determine which method implementation will be invoked. With the
use of dynamic binding, the method to be invoked may not have been implemented yet
and so a dependence analysis cannot be performed. Worse still, this technique causes
an explosion in the number of dependence analyses which must be performed as the
size the code fragment increases.
1.2.1.2 Pointer/Reference May-Alias Analysis
Computing data dependencies in programming languages which have pointers, refer-
ences, or reference types is further complicated by the fact that a single memory location
may be referred to by multiple names within a single program. To conduct data de-
pendency analysis in the presence of such aliasing, the traditional solution would be to
undertake a may-alias analysis to determine which variables in the program could refer
to the same location in memory [6, 86].
A may-alias analysis is a static analysis and so it does not have access to actual pointers
and objects. Instead of tracking objects and memory addresses, may-alias analyses try
to disambiguate variables using allocation site information [6]. An allocation site is the
method responsible for allocating an object; therefore, if two objects are created at the
same allocation site a traditional may-alias analysis would identify that the two objects
could be the same. This is an approximation of the program’s actual behavior and can
result in a number of false-positive aliasing relationships being identified [86].
1.2.1.3 Array Data Dependency Analysis
Loops are one of the major sources of parallelism in imperative programming languages.
Loops can be parallelized in several different ways, depending on the loop’s inter- and
intra-iteration data dependencies. The traditional approach to finding a loop’s inter-
Chapter 1. Introduction 9
and intra-iteration data dependencies is to perform a loop specific data dependency
analysis. Consider the sample nested loops shown in Listing 1.2
for (int i = 1; i < n; ++i) {for (int j = 1; j < n; ++j) {
f[i] = g[3 * i - 5] + 1;g[2 * i + 1] = i * j;
}}
Listing 1.2: An example of a pair of nested loops used in the discussion of traditionalarray data dependency analysis.
As was the case with the dependency analyses techniques already discussed, array de-
pendency analysis techniques consider all of the statements in the loops being analyzed
in a pairwise manner. So considering the example shown in Listing 1.2, we would want
to determine if g[2 * i + 1] and f[i] could be referring to the same location. The
first step would be to perform a may-alias analysis to determine if the two arrays, f
and g, could be aliases. If they could be aliases then it remains to determine if i and
2 * i + 1 could ever be equal over the range of i.
A number of different techniques have been proposed to try to answer questions about
the relationships between different array index expressions. These techniques operate
only on affine indexing expressions. One of the simplest approaches, proposed by
Banerjee, uses the Greatest Common Denominator (GCD) to determine if the two
array index expressions could be equal[11]. Banerjee requires the loop to be normalized
to iterate from 1 to terminal value incrementing by 1. Once the loop is in this form
the array index expressions are arranged into the form a * i + b and c * i + d.
Banerjee proved that if a loop carried dependence exists then GCD(c, a) must divide
(d − b).
Techniques for detecting when array index expressions could lead to dependencies
through arrays continued to evolve during the 1990s. In 1991 it was proved that solving
a system of constraint equations for array index expression to find out if they could
cause a data dependency on the array was an NP -complete problem [75]. One of the
most advanced techniques which allows precise solutions to systems of affine array in-
dex equations to detect dependencies was proposed by Pugh in the form of the Omega
Test [98, 97]. The Omega Test was ultimately developed to show that for systems
10 Chapter 1. Introduction
of affine equations, the data dependence analysis could be performed in a reasonable
amount of time [68].
Function invocations may also cause problems for array data dependency analysis since
it is not possible to tell what a method will do when considering statements in a pairwise
manner. Further, non-affine index expressions or unusual loop traversal patterns can
easily foil many array dependency analyses.
1.2.1.4 Summary
Traditional data dependency analyses have been shown to work well on scientific pro-
grams with small, tightly nested loops [97]. Unfortunately, the limitations of inter-
procedural analyses and may-alias analyses mean that in popular, imperative, object-
oriented languages, traditional analysis techniques do not always produce accurate
results which means there is a high rate of false-positive dependencies identified [58].
These modern languages do not provide language features to help facilitate reasoning
about data dependencies. Even in the best case of fully context sensitive data depen-
dency analysis, the allocation sites tracked are still approximations. This thesis will
contribute a new approach to the problem of performing data dependence analysis for
parallelization purposes by extending an object-oriented language with features to help
facilitate dependency and effect analysis. Unlike the traditional techniques discussed
in this section, my new approach is designed to be abstract, composable, and usable
by both programmers and automated tools. By composable, I mean to say that paral-
lelism analysis can be performed at levels of granularity below that of whole program
analysis. This means that libraries and other reusable software components can be
used in programs without having to include all of their implementation details in any
parallelism analyses undertaken.
1.3 Type Systems
The type system in a programming language is fundamental to how programmers and
automated tools, like compilers, understand and reason about programs. There is a
whole spectrum of languages whose type systems provide different features and different
Chapter 1. Introduction 11
guarantees of varying strengths about what a program may or may not do at runtime.
Some languages choose to try and enforce guarantees and invariants prior to program
execution while others only type check the program as execution proceeds.
Programming languages can be classified into four groups based on the strength of the
guarantees and invariants provided by the language and when programs are checked
to ensure that they do not violate guarantees and invariants provided by the language.
Languages which validate programs prior to execution are said to be statically typed
while those which defer such validation to runtime are said to be dynamically typed.
Languages which provide strong guarantees and invariants are said to be strongly typed
while those that do not are said to be weakly typed. Most languages fall in a two-
dimensional spectrum between these different criteria [107]; some are more statically
typed than others, for example.
As previously stated, it is my belief that reasoning about parallelism is the key to
making parallelism more accessible. When trying to construct a system for reasoning
about inherent parallelism it is highly desirable to have as much reliable and detailed
information about a program as possible. Statically typed languages, by their natures,
provide invariants which are amenable to validation prior to program execution. These
invariants can sometimes be used to make reasoning about parallelism easier. Because
of this, I have chosen to focus this thesis on strong statically typed languages. The
techniques I present might be adapted to work for other types of language, but this is
beyond the scope of this work.
1.4 My Approach
Computer programmers must maintain a mental image of the algorithm a program is
implementing, the data structures it is operating on, and the control flow required to
implement the algorithm and manipulate the data structures being processed. Simi-
larly, compilers must maintain a similar understanding of a program when computing
program invariants and undertaking performance optimizations on it. Programming
languages have evolved to facilitate this reasoning by providing different levels of ab-
straction. These abstraction mechanisms allow analyses to be compartmentalized and
12 Chapter 1. Introduction
reduce their complexity by hiding details not required for the analyses. For example,
control flow can be abstracted into functions. The power of abstraction is realized only
when sufficient information is exposed to facilitate reasoning while hiding unneeded
information. Obviously, this balance changes when the nature of the reasoning being
undertaken changes.
One classic example where exposing too much detail made understanding more difficult
is the use of GOTO statements in high-level programming languages. In his famous
letter to the CACM, Dijkstra argued that GOTO statements break the abstractions
used by programmers to reason about the state of a program during execution (the
programmer’s coordinate system as he called it) [40]. Over time, this view has been
largely accepted by designers of high-level programming languages and GOTO has been
replaced by other more structured control flow structures.
In contrast to the problem of the GOTO statement, where the statement causes ab-
stractions to be violated, determining side-effects in imperative programs is complicated
by the lack of detail provided by some control flow abstractions, especially functions.
Functions allow code blocks to be abstracted and reused. Unfortunately, in the family
of imperative programming languages I have chosen to focus on, there are no constraints
on the side-effects a function may have; they can access and update arbitrary shared
mutable state. Worse, their interface contract stipulates only the input and output
data types; the interface does not provide guarantees about the side-effects of invok-
ing a function. This lack of effect information makes reasoning about inter-procedural
dependencies difficult. I feel this is one of the key reasons why there has been only lim-
ited success in making parallelism more accessible to programmers using these popular
languages.
The obvious solution to this problem is to add side-effect declarations to function sig-
natures to capture the memory locations touched by the function. The actual memory
locations described by these declarations may not be known statically and could total
millions of individual memory locations. This means that the memory locations can-
not be enumerated individually on the function signature. It is, therefore, necessary to
abstract and summarize these effect sets in some way. One approach would be to list
logical regions or subsets of the entire memory space that include all of the memory
Chapter 1. Introduction 13
locations touched. These subsets must contain all of the memory locations touched,
but may also include memory locations not actually touched. Figure 1.2 shows an ab-
stract diagram of memory regions. Dependencies may exist when regions named in the
effect sets concerned overlap (these overlapping areas are visible in Figure 1.2). The
more precise the effect sets the smaller the number of false positives produced when
computing dependencies through set overlap. In cases where the effect sets, cannot be
computed, the effect can be conservatively stated as the entire memory space. Effect
precision can be further enhanced by separating the read and write effect sets since the
overlap of two read effects does not create a data dependency.
Figure 1.2: Abstract memory regions which can be named as effects. The areas ofoverlap between regions indicate possible data dependencies.
To be able to reason about parallelism, the goal of this thesis, I feel that both program-
mers and automated tools must use a unified system to reason about side-effects and
dependencies to make reasoning about parallelism practical. Programmers must be
able to describe and understand the side-effects in their programs and compilers must
be able to verify side-effects and communicate to the programmer which side-effects in
a program are hindering parallelism. If the parallelization process is shared between
the programmer and the automated tools and they are not using the same reasoning
system then communication is hindered and the effectiveness of the system is reduced.
14 Chapter 1. Introduction
1.4.1 Object-orientation
Given that I have chosen to focus on reasoning about parallelism in strong statically
typed, imperative languages, the next logical question is to consider precisely what
techniques could be used to build the effect system required. Capturing side-effects
in an abstract and composable manner requires programs to have structure to their
data and code. The family of imperative languages includes a large number of diverse
languages. Some are focused on the sequence of steps required to solve a problem,
but provide relatively few features for structuring the data being processed. Other
languages place more of an emphasis on the data being processed and so provide more
powerful features for structuring data and associating code with data.
Over the years, many different schemes for structuring an imperative program’s data
and code have been developed. Object-orientation is one such scheme which was de-
veloped in the 1960s and 1970s in languages like Simula [16, 92] and Smalltalk [47].
Object-orientation is centered around three fundamental concepts: encapsulation, in-
heritance, and dynamic method binding [107]. These principles encourage programmers
to structure their programs so that implementation details can be abstracted away and
different parts of the program are decoupled from one another.
The most interesting property of object-orientation which makes it attractive as a means
of abstracting and reasoning about side-effects is encapsulation [107]. Encapsulation
means restricting access to implementation details of an object [107]. This means
that objects which form part of an object’s representation (those objects which store
its internal state) should be protected from external access. Objects nest inside one
another to form a representation hierarchy which provides structure to the program.
Encapsulation, therefore, provides a means of abstracting parts of the program to hide
implementation details [57, 5].
In Figure 1.2, I showed how memory regions could be used to abstract program memory
operations, yet retain the ability to detect possible data dependencies between different
operations. Object-orientation allows this idea to be refined in terms of the hierarchy
of object representation formed by the principle of encapsulation. This structure can
be used as a basis for memory regions [28]. Naming an object as being read or written
can be taken to mean that the object itself or any object in its representation is read
Chapter 1. Introduction 15
or written. Because an object is normally only part of one object’s representation, the
memory regions become shaped like triangles and represent sub-trees of the program’s
representation structure. Two effects overlap if one is part of the other’s representation.
This structure is shown in Figure 1.3.
Figure 1.3: Memory regions based on an object-oriented program’s representation hi-erarchy. The white circles are objects and the triangles are the areas of memory whichform part of their representation. The inclusion of one triangle within another indi-cates that a data dependency could exist if they were named as effects on differentoperations.
I have chosen to focus this thesis on reasoning about parallelism in imperative object-
oriented languages because of the structure they provide to programs and their popu-
larity. There are a number of advantages and disadvantages to this approach compared
with traditional data dependency analysis techniques and this comparison will be dis-
cussed in detail in Chapter 8. The techniques discussed in this thesis may also apply
to languages with less structure or other types of structuring, but these languages are
considered to be outside the scope of this work.
16 Chapter 1. Introduction
1.4.2 Ownership Types
The structure of object-oriented programs, while useful to facilitate reasoning, is not
sufficient, on its own, to capture side-effects in an abstract and composable manner.
There has been a large amount of research in the program verification community on
using the structure of object-oriented programs to facilitate validation of invariants.
These systems can be adapted to facilitate inference of parallelism invariants; this
thesis adapts one such system called Ownership Types.
Ownership Types was originally constructed in response to validation experts noticing
that encapsulation enforcement was lacking in many popular object oriented languages.
Consider the code in Listing 1.3. Note that despite the private annotation on the
private Object[] signers;...public Object[] getSigners() {...return signers;}
Listing 1.3: A code snippet showing the Java 1.1.1 getSigners bug.
signers field, it is possible for the getSigners method to return the object referenced
by this field. The private annotation on the field protects only the name of the field
and not the data it contains. This code was the source of the infamous getSigners
bug in Java 1.1.1 for precisely this reason [113]. Ownership Types [28, 25, 27, 95, 73] is
one of the systems originally proposed to enforce this kind of protection in a rigorous
manner.
Ownership Types is one means of tracking this information because it makes the pro-
gram’s representation hierarchy explicit through the type system and also provides
facilities for abstracting these representation descriptions. It has been proposed in the
past that Ownership Types could be used to facilitate parallelization [20, 19], but their
use in discovering inherent parallelism has only just begun to be studied [18]. In this
thesis I use Ownership Types to provide a framework for abstractly expressing and
validating side-effects as well as detecting data dependencies which allows reasoning
about inherent parallelism.
Chapter 1. Introduction 17
1.5 Contributions
This thesis focuses on how to facilitate reasoning about the inherent parallelism in
strong, statically-typed, imperative, object-oriented languages. Reasoning about inter-
procedural dependencies in these imperative languages is complicated by the arbitrary
side-effects that can be caused by methods. The central idea is to build an abstract and
composable effect system that can be used to specify side-effects as part of a method’s
signature and to then use this effect information to detect data dependencies which
can, in turn, be used to find inherent parallelism.
The major contributions of this thesis are:
• Pulling together features from Ownership Types and adapting them to facili-
tate capturing, abstracting, and validating side-effects in a real language on real
programs to facilitate the detection of inherent parallelism.
• Developing and proving sufficient conditions for parallelism in terms of ownership
effects and relationships for a number of different parallelism patterns.
• The development of a runtime system to compliment static reasoning about par-
allelism to facilitate conditional parallelization of code blocks.
None of the existing works combine abstraction and composition with parallelization
and correctness checking to produce a framework which helps both programmers and
automated tools to reason about inherent parallelism. To demonstrate the practicality
of the ideas proposed, I have developed an extension of the C] version 3 language called
Zal (which means “dawn” in Sumerian, the earliest known written language). The
reasons for the choice of C] are discussed in Section 4.2. I have written a Zal compiler
which computes and validates effects in addition to using the effect information to safely
inject parallelism into an application. A complimentary runtime system, which is used
to facilitate conditional parallelism, has also been written in addition to libraries to
support the ownership and effects annotations added to the C] language. There are
a number of smaller, more technical contributions made through this implementation
process and these will be identified in the appropriate chapters.
18 Chapter 1. Introduction
The work contained in this thesis has produced publications [100, 31] including a paper
in a well regarded international conference in the area [31] as well as several technical
reports detailing different refinements of the proposed system. I extended the GPC]
compiler’s generic type infrastructure to support arbitrary type parameters. I then
used this infrastructure to help implement a complete compiler for the Zal program-
ming language; a compiler extension totalling over 10,000 lines of code. In addition
to this, I developed a number of runtime libraries to support tracking of ownership
and exploit conditional parallelism detected at compile-time. All of the publications,
the compiler, and the runtime libraries as well as the sample applications presented
in Chapter 7 developed as part of this research project are freely available from the
MQUTeR parallelism website [32].
1.6 Outline
Chapter 2 focuses on how to reason about data dependencies and inherent parallelism
in an abstract and composable manner. The chapter begins by explaining how the
representation hierarchy found in all object-oriented programs can be exploited to cap-
ture side-effects in an abstract and composable manner. The chapter then proceeds to
present sufficient conditions for the safe exploitation of task, loop, and pipeline par-
allelism. To help facilitate parallelization I formulate a new enhanced foreach loop
which can be used to rewrite for and while loops into a form suitable for analysis. The
discussion of the effect system and sufficient conditions is kept abstract and informal in
this chapter. The ideas are refined and formalized in subsequent chapters culminating
in the very formal definitions and proofs of correctness presented in Chapter 5.
Chapter 3 further refines the ideas presented in Chapter 2 and presents how these ideas
can be realized using concepts from Ownership Types. The discussion in Chapter 2
focuses on the high-level concepts of how objects and their relationships could be used
to express side-effects. The focus of Chapter 3 is on how object relationships can be
captured as part of a type system using Ownership Types. It also refines and begins to
formalize the sufficient conditions for parallelization presented in Chapter 2 in terms
of the Ownership Types system.
Chapter 1. Introduction 19
Chapter 4 continues the process of refining and realizing my ideas by presenting the
design of an extension of the C] language called Zal. Zal incorporates the ideas from
Chapters 2 and 3 into version 3.0 of the C] language. In doing so, it applies ideas
from Ownership Types to a number of program constructs which have not previously
been annotated with ownership information. It also examines some of the interesting
technical details discovered in undertaking this design and implementation exercise.
Having refined my ideas and incorporated them into a real-world programming lan-
guage, it remains to demonstrate the validity of my approach. Chapter 5 begins this
validation by sketching the formal proofs required for the proposals made in preced-
ing chapters. It also proves that the sufficient conditions for parallelism presented in
Chapter 3 are sound.
Chapter 6 discusses the design of a source-to-source compiler for Zal as well as a runtime
system for tracking ownership information and testing object relationships at runtime.
A number of interesting technical challenges and design issues were encountered during
the implementation process and these are discussed as the design is presented. The
compiler helps to demonstrate that the ideas proposed in this thesis can be realized
and serves as an enabler for the empirical validation presented in Chapter 7.
Having produced a compiler for Zal, it was used to compile a number of representative
sample programs written in Zal. Chapter 7 presents several detailed worked examples
of real programs originally written in C]. This chapter details the annotation of these
examples along with performance data comparing the automatically parallelized Zal
program to a hand parallelized implementation. The goal of this exercise is to validate
the proposals made in this thesis in terms of efficacy and effectiveness.
With the validation completed, Chapter 8 compares and contrasts the work in this
thesis with other related works. There are three broad areas of related work in the lit-
erature: traditional data dependency analyses, Ownership Types systems, and existing
parallel programming languages. Previous work in Ownership Types has focused on
program verification. The idea of using Ownership Types to reason about parallelism
has been proposed in the literature, but has not been tested until now. Existing work
on languages and type systems for parallel computing has tended to focus on providing
20 Chapter 1. Introduction
the programmer with tools to facilitate the manual parallelization of programs; pro-
grammers must decide when and where it is safe to employ parallelism without the
assistance of the compiler or other automated tools.
I conclude in Chapter 9 and outline possible areas for future work and further devel-
opment of the system based on my experiences and the results of applying the system
to real-world programs.
Chapter 2
Reasoning About Parallelism
This chapter presents my key ideas and insights about how side-effects can be statically
computed, summarized, and abstracted using memory regions. The intersection of the
effect sets, in terms of memory regions, of two arbitrary code fragments can be used
to detect possible dependencies between them. This static reasoning system forms the
core of this thesis and is used to determine when inherent parallelism can be safely
exploited.
The ideas presented in this chapter are in their most abstract and theoretical form.
Subsequent chapters will refine these ideas into a form where they can be applied to
a specific language and, in turn, that language will be used to parallelize some real
applications to validate my proposals.
This chapter is divided into two parts. The first part presents my ideas for an abstract
and composable effect system. This effect system describes effects for arbitrary code
blocks using memory regions. These effect descriptions contain all of the areas touched
by the code being analyzed, but may also include some additional areas. Following
the development of the effect system, I demonstrate how dependencies can be detected
by computing the intersection of the memory regions touched by the code blocks in
question.
The second half of this chapter develops conditions for the safe exploitation of a num-
ber of different parallelism patterns in terms of this effect system. Because the effects
produced by the effect system may include additional memory locations not actually
21
22 Chapter 2. Reasoning About Parallelism
touched by a given code fragment, the effects are not precise. This means that condi-
tions presented in terms of these effects are sufficient conditions. In describing these
conditions as sufficient, I am saying that there may be situations where the conditions
do not support parallelization of an application even though such parallelization would
not actually alter the behavior of the program. My system will never permit paralleliza-
tion when doing so could lead to behavior not consistent with the original sequential
version of the program.
The different parallelism patterns discussed in this chapter exploit different types of
inherent parallelism and so each has different sufficient conditions for safe use. This
chapter informally presents these sufficient conditions; they are refined in Chapter 3
formalized and proven in Chapter 5.
2.1 Side-effects
There are two types of dependencies that can exist in a program: control dependencies
and data dependencies. A control dependency occurs when a statement A determines
if a statement B executes while a data dependence occurs when a statement B accesses
the same memory as an earlier statement A.
Methods in imperative programming languages can be sources of side-effects which
can be involved in both kinds of dependency. Inter-procedural control dependencies
in these languages generally take the form of exceptions. It is possible to document
control-flow side-effects as part of the method signature, as was done with checked
exceptions in Java. Detecting, documenting, and enforcing restrictions on control flow
side-effects is a well studied problem both inter- and intra-procedurally. This thesis
does not concern itself with control flow dependencies of this type, but does acknowl-
edge that they exist in programs and may need to be accounted for when parallelizing
an application for correctness. The remainder of this thesis will focus on reasoning
about data dependencies, a problem that is not nearly as well studied, with a view to
facilitating the discovery and safe exploitation of inherent parallelism in mainstream
modern imperative programming languages.
Imperative programming languages expose the computer’s ability to store and retrieve
Chapter 2. Reasoning About Parallelism 23
intermediate results to programmers. This means that any given program fragment
written in an imperative programming language can update arbitrary parts of the pro-
gram’s shared mutable state. This does not cause a problem when only one instruction
in the program is executed at a time. However, when multiple instructions are executed
at the same time, instructions may attempt to read values before they are generated,
read values that have been updated by instructions that should be later in the pro-
gram, or race to update the same memory location. That is to say that executing
multiple instructions at the same time may cause dependencies between instructions to
be violated.
Traditional dependency analysis techniques operate on the level of individual variables
and memory locations. In order to determine where data dependencies may or may not
exist, it is necessary to undertake some kind of may-alias analysis to determine which
variables may refer to which locations in memory. Mainstream imperative programming
languages provide no basis for tracking or reasoning about aliasing and so these analyses
are difficult, especially when performed inter-procedurally. In the presence of complex
control flow, polymorphism, and dynamic method binding, the results of a may-alias
analysis tend to be that most variables of compatible types are possible aliases due to
the lack of language support. It is my opinion that as a result of this, it is necessary to
change the language to provide support to make reasoning about side-effects and data
dependencies tractable.
In addition to the language features above that can complicate dependency analysis,
there are two other facilities, generally provided by modern imperative languages, which
complicate reasoning about side-effects and dependencies. The first is separate compi-
lation. Separate compilation allows a program to be compiled in separate parts and the
result of those compilations can then be linked together at a later time. This means
that the source for a function may not be available for examination when undertaking
one or more of the compilations.
In this thesis, I advocate capturing the maximum side-effects of an executable definition,
expressed as sets of memory locations read and written, as part of the definition’s
declaration. The programmer may provide effect declarations to be validated against
the actual implementation at compilation time. If the effects of the implementation
24 Chapter 2. Reasoning About Parallelism
exceed the declared effects then the program is considered to be invalid.
The goal of this work is to facilitate reasoning about the side-effects of arbitrary code
blocks for the purposes of computing dependencies and identifying opportunities to
exploit inherent parallelism. Adding effects to method signatures facilitates this because
it makes examining a method’s signature sufficient to determine the upper bounds
of its side-effects, thus eliminating the complications caused by separate compilation.
Others have proposed capturing effects on method signatures, but my formulation of
this idea, presented in this thesis, is geared towards reasoning about parallelism rather
than validating program properties; the traditional domain of such techniques (see
Chapter 8 for further discussion of related work).
Dynamic linking is also used frequently by modern programming languages and their
associated libraries to reduce executable file size and facilitate independent updates
of libraries. In dynamic linking, libraries and other third-party code are loaded into a
program only at runtime. This means that the effects of a method found in one of these
libraries at compile-time may not match the effects on the one loaded at runtime. This
problem is similar to other library versioning problems which have been well studied
and some of the same solutions can be applied. Simply checking that the version of the
library used by the compiler matches the version of the library linked in is sufficient
to avoid the problem of mismatches between effects at compile and execution time.
If greater flexibility is required with regard to versioning, then other more complex
solutions may be required.
2.1.1 Abstracting Effects
Having decided to add side-effects to method signatures, the question which arises is
how to describe method side-effects. Adding side-effects to method signatures could
expose implementation details that would otherwise be hidden from those calling the
method. Exposure of implementation details through effect declarations would break
composition; different implementations of the same interface could have different effect
declarations due to their differing implementation details only. Using an abstract effect
system could help to prevent this, if the abstraction is correctly chosen. Abstracting
effects has the added benefit of allowing precision to be traded for simplicity when the
Chapter 2. Reasoning About Parallelism 25
amount of effect information becomes overwhelming. The key to obtaining all of these
benefits is choosing an appropriate effect abstraction system.
One way to try to obtain a suitable abstraction and composition mechanism for ef-
fects is to base the system on the program’s existing structure. Different imperative
programming languages provide different kinds of structure to programs. Some lan-
guages emphasize the sequence of steps to be executed and rely on functions as the
major unit of abstraction and organization; they provide relatively little structure to
a program’s data. Others place an emphasis on the structure of the program’s data
and provide means to associate code with data. One of the most popular program
structuring techniques in use today is object-orientation. In object-orientation, data is
grouped into objects which represent a concept or artifact in the system and consist of
data and associated functions called methods. Each object has its own internal state,
parts of which may be stored in other objects in the system; these referenced objects
holding this state are said to be part of the object’s representation. Viewing a pro-
gram’s heap memory in terms of objects and the representation relationships between
them provides a hierarchical view of the heap. This hierarchy can then be used to
describe and abstract effects without exposing implementation details. Most popular
modern imperative languages like Java, C], and Visual Basic employ object-orientation
to structure the program. As outlined in Chapter 1, this thesis will focus on object-
oriented imperative programming languages and will use their representation structure
to help describe side-effects in an abstract and composable manner.
2.1.2 Effect System Details
Having chosen to use the representation hierarchy as a basis of an effect system, the
next question is, how exactly are side-effects described in such a system? Side-effects
in such a system can be named using sets of objects which can be read or written.
An object named as an effect implicitly includes all the objects that form part of its
representation as part of the effect. Figure 2.1 shows a number of different objects,
illustrated as white circles, and the memory area they encompass when named, note
the implicit representation inclusion. A side-effect can, therefore, be abstracted by
naming one of the objects whose representation is accessed or modified instead of the
26 Chapter 2. Reasoning About Parallelism
object itself. For example, in Figure 2.1 the effect represented by object 4 could be
abstracted as the effect represented by object 3 which includes it. This system facilitates
information hiding because even when an object’s representation is changed, the effect
of accessing or modifying that representation can still be summarized in the same way.
Precise details of how these effects can be expressed in a language is the subject of the
next chapter, Chapter 3.
Figure 2.1: Memory regions based on an object-oriented program’s representation hi-erarchy. The white circles are objects and the triangles are the areas of memory whichform part of their representations. The inclusion of one triangle within another is thebasis of effect abstraction and dependency detection.
Composition is the construction of a software assembly from smaller assemblies. An
effect system used in a language designed to support composition, as most modern
imperative programming languages do, needs to provide facilities for abstracting side-
effects so that implementation details are not inadvertently exposed through the effect
system. Composition, in the effect system I am proposing, is achieved through ab-
straction. Two different objects named as effects can be abstracted to a single object
which contains the two objects as part of its representation. This abstraction can be
used to hide implementation details while still providing correct effect information; the
Chapter 2. Reasoning About Parallelism 27
abstraction only makes the effect less precise. This means that interference between
operations which employ the same component units can still be detected when desired.
How to do this is the focus of the next major section, Section 2.2.
It is important to note that other data side-effects, such as reading and writing physical
devices and other system resources can be captured using my proposed effect system.
These devices and resources can be thought of as other objects which exist in the
system. Provided they are given a name they can be modelled in the effect system like
any other object. This is similar to the idea of memory mapped I/O in hardware. In
the rest of this thesis, reference will be made to side-effects, often with reference to the
objects or memory being read and written, but the reader is asked to remember that
these effects can also include other resources.
2.1.3 Effect System Complications due to Object-Orientation
Having chosen to focus on object-oriented imperative programming languages, there is
one issue which arises which needs to be addressed in relation to method side-effects
— overriding. Method overriding can occur in two situations: replacing an existing
method implementation when subclassing, and implementing a method required by
an interface or abstract class. When overriding is coupled with dynamic binding the
result is that the most specialized implementation of a method is invoked by any given
call site. The downside of this is that it is not possible to determine which method
implementation will be invoked at any given call site at compile-time.
The inability to predict which method implementation will be invoked by a call site
complicates inter-procedural data dependency analysis since it becomes impossible to
determine which method’s signature needs to be examined to obtain the maximum
side-effects of invocation. One solution to this problem, which I have adopted in my
system, is to constrain the side-effects of overriding methods to include at most the
side-effects of the method being replaced. This means that the effects of the original
method declaration can be used in all effect computations and analyses without a need
to resolve exactly which implementation is being invoked. These effect constraints can
be verified at compile-time and methods which do not adhere to these constraints are
considered to be invalid by my system and will not be successfully compiled.
28 Chapter 2. Reasoning About Parallelism
2.2 Detecting Data Dependencies
As discussed in Chapter 1, dependencies can take two forms: control dependencies and
data dependencies. Detecting control dependencies is relatively easy in modern im-
perative object-oriented languages due to the flow control abstractions they use. Data
dependencies are harder to detect because the major unit of control flow abstraction in
modern imperative object-oriented languages, the method, hides side-effect information
and so it is not possible to determine which pieces of data are read and written by a
method without examining its implementation.
The purpose of calculating side-effects was to facilitate the detection of data depen-
dencies. Having decided to use the representational hierarchy to express side-effects in
an abstract and composable manner, there needs to be some way to determine if two
different effects are disjoint or not.
When using the representation hierarchy as the basis of the effect system, side-effects
are expressed by naming objects whose representation is accessed, as discussed in Sec-
tion 2.1. Each object is directly part of at most one object’s representation. Naming
an object as being read or written implicitly names all of the named object’s repre-
sentation as being read or written. An object named as an effect, therefore, names an
entire subtree of objects in the representation hierarchy. Two objects, o1 and o2, when
named as effects are said to not overlap when o1 is not part of o2’s representation and
o2 is not part of o1’s representation (that is o1#o2 = ∅).
For two effect sets, E1 ={a1 . . . an
}and E2 =
{b1 . . . bm
}, to be non-overlapping, it is
necessary to determine if all of the objects in effect E1 are disjoint from the objects in
effect E2. If ∀i ∈ n, j ∈ m ai#bj = ∅ then the effect sets are disjoint.
Effects are specified as a tuple consisting of a set of objects read and a set of objects
written. As previously mentioned, there are three different types of data dependencies
which need to be preserved during parallelization:
• Flow Dependence — a memory location written by one operation is read by a
subsequent operation; a read-after-write (RAW) dependence.
• Output Dependence — a memory location written by one operation is also
Chapter 2. Reasoning About Parallelism 29
written by a subsequent operation; a write-after-write (WAW) dependence.
• Anti-Dependence — a memory location read by one operation is updated by
a subsequent operation; a write-after-read (WAR) dependence.
Given two arbitrary effect tuples⟨r1, w1
⟩and
⟨r2, w2
⟩, it is sufficient to show that r1
does not overlap w2, r2 does not overlap w1, and w1 does not overlap w2 to show that
no data dependencies exist between the code blocks generating the tuples.
It is important to note that just because two object effects are not disjoint does not
mean a data dependence must exist, only that a data dependence may exist. If a data
dependence may exist, then to ensure the correctness of the program, it is necessary to
conservatively execute the original sequential version of the code fragment rather than
a parallel version.
2.3 Task Parallelism
Having completed the discussion of how to abstract side-effects and reason about de-
pendencies in the first half of this chapter, the following two sections examine different
types of parallelism and how they can be detected using this information.
If a programmer has a set of operations that need to be performed, most imperative
programming languages default to having the programmer list the computations in some
arbitrary sequence with each statement terminated by an end of statement operator (for
example, a semi-colon in many ALGOL descendents) thereby imposing a total order on
the computations. In actuality, the dependencies between operations may imply only a
partial order to the steps. The difference between this partial order and the total order
represents potential for parallelism. This type of parallelism is commonly referred to as
task parallelism [9] — the distribution of the execution of different, disjoint operations
across different threads of execution. The partial order can be constructed from the
total order by computing the dependencies between computations.
30 Chapter 2. Reasoning About Parallelism
2.3.1 Sufficient Conditions
The effect system outlined in Section 2.1 discussed how side effects could be captured
using an abstract hierarchical model of a computer’s memory. Consider the paralleliza-
tion of two statements S1 followed by S2. The dependencies between statements can
be calculated based on their side-effects. For the two statements to execute safely in
parallel, provided that no control dependencies which would prohibit parallelization
exist, it is sufficient to show that the following dependencies do not exist as follows:
• flow dependency — the write effects of S1 do not overlap with the read effects
of S2;
• output dependency — the write effects of S1 do not overlap with the write
effects of S2; and
• anti-dependency — the read effects of S1 do not overlap with the write effects
of S2.
As was discussed in Section 2.2, the overlap of two effect sets can be determined by
considering effects pairwise from each set to determine if either one is included in the
other’s representation. If such an inclusion is found there may be a dependence between
the two statements. If no dependencies exist, the two statements may be safely executed
in parallel.
2.4 Loop Parallelism
Data parallelism is a form of parallelism which arises from the application of a stream
of operations to the elements of a data collection. Data parallelism is one of the ma-
jor sources of parallelism in programs and, more importantly, it is a source of scalable
parallelism. Scalable parallelism is parallelism that is not limited in the number of
threads of parallel execution by the structure of the program. The number of streams
of execution can be increased to improve performance or handle larger data sets as
more processing units become available. Exposing this kind of parallelism, allows the
Chapter 2. Reasoning About Parallelism 31
program to exploit additional processing units when they become available, thus allow-
ing the program to benefit from future increases in the number of computational cores
in a computer.
In modern, imperative, object-oriented programs, one of the major sources of data
parallelism is the loop; specifically, loops which operate on data collections. Loops
which operate on data collections can take many different forms and perform a number
of different operations. Most of the operations performed by loops operating on data
collections can be broadly classified into one of three groups as listed below. Note that
the definitions of map, reduce and filter used here are slightly different from those used
in functional programming owning to the nature of imperative programming and its
extensive use of shared mutable state.
• map — A per element operation which modifies the value of the element it
is applied to (it may take additional parameters other than the element to be
updated).
• reduce — An operation which takes two or more elements of a collection and
produces a single result (it reduces a collection to a value).
• filter — An operation which removes elements from a collection based on a
predicate.
Of these three operations, loops applying a mapping operation to elements of a col-
lection are one of the main forms of data parallelism found in mainstream imperative
object-oriented languages today [123, 10]. Loops applying filter and reduce operations
may be parallelized using some kind of tree reduction technique, but doing so would
depend on the properties of the operation being performed rather than just the side-
effects of the operation being applied as is the case with the map operation. This thesis
focuses on the parallelization of mapping loops. Approaches for the parallelization of
reduction loops are discussed as part of the future work discussed in Section 9.3.3.2.
This section applies my proposed system to two loop parallelism patterns for loops
which apply a mapping operation to a collection: (1) data parallel loops where loop
iterations execute independently and are distributed across multiple processors and
32 Chapter 2. Reasoning About Parallelism
(2) pipelining where the execution of a loop iteration is divided up into stages and
distributed across multiple processors. These patterns are based on each iteration of
the loop operating on a distinct data element. Loops which process the same data
element multiple times might still be parallelizable, but are not considered in this
section. Note also that some loops may employ a combination of map, reduce, and
filter operations and it may be possible to extract the map operation into a separate
loop to facilitate its parallelization. Automatically doing this extraction is beyond the
current state-of-the-art in the general case and is considered to be outside the scope of
this thesis.
2.4.1 Data Parallel Loops
The data parallelism pattern can be safely applied only if there are no inter-iteration
dependencies. This section begins by looking at the most declarative imperative looping
construct, the foreach loop, before moving on to more complex looping constructs.
2.4.1.1 Foreach Loops
Consider the simple data parallel loop shown in Listing 2.1:
foreach (T element in collection)element.operation();
Listing 2.1: A simple stereotypical data parallel foreach loop.
To execute the loop shown in Listing 2.1 in parallel, the elements of the collection
traversed by the loop must be distinct and there must be no dependencies between
the operations applied to the elements of the collection. If the operation applied to
each element did not write to any shared mutable state, the loop could run safely in
parallel, but it would not be able to perform any useful computational function since the
result could not be stored. Updates to arbitrary shared state could create dependencies
between the operations applied to each element. However, if the operation updates only
the element of the collection it is applied to then the following informal conditions are
sufficient to ensure that no inter-iteration dependencies exist in the loop:
Chapter 2. Reasoning About Parallelism 33
• Loop Condition 1: there are no control dependencies which would prevent loop
parallelization,
• Loop Condition 2: the elements enumerated by the iterator supplying elements
to the loop body must be unique and have non-overlapping representations, and
• Loop Condition 3: the operation mutates only the representation of the ele-
ment on which it is invoked and it does not read the representation of any other
elements in the collection although it can read other, disjoint shared state.
These sufficient conditions will be formalized and proved in subsequent chapters.
Detecting the control dependencies which are the subject of Loop Condition 1 is a much
simpler problem than detecting data dependencies and is outside the scope of this work.
I do not claim any new contribution with respect to detecting control dependencies in
this thesis.
Loop Condition 2 can be satisfied in one of two ways. Either the uniqueness condition
can be dynamically tested just before loop execution or the programmer can assert
the uniqueness to hold. In the case of a programmer assertion, there is the option of
verifying the uniqueness invariant at runtime or turning off such assertion checking in
order to improve performance. If checked, such a uniqueness invariant could be verified
either during each insertion or just before when the invariant actually needs to hold.
The uniqueness assertion can be made by annotating either the collection itself or its
enumerator (a collection may contain duplicates, but if its enumerator returns only
unique elements then the condition is still effectively met). The uniqueness annotation
could be placed on the collection class, or just on specific instances of that collection
class. Which of the above possibilities is used to ensure Loop Condition 2 is met will
depend on programmer and performance considerations — this thesis does not stipulate
a single mechanism.
The final condition, Loop Condition 3, prevents two iterations from violating flow and
anti-dependencies. Loop Condition 2 provides that all of the elements traversed by
the loop are unique and so only one iteration can update each item. If updates were
allowed to any other shared state, different iterations could update the same memory
location which would create an output dependence between iterations.
34 Chapter 2. Reasoning About Parallelism
If any of the conditions is known not to hold, then the original sequential loop should
be executed to preserve program correctness.
Loop Body Rewriting
Now that the sufficient conditions for the parallelization of a simple data parallel loop
have been explored, the question of how to generalize these conditions to handle arbi-
trary foreach loop bodies, like that shown in Listing 2.2, arises.
class Foo {...int a;foreach (T elem in collection) {
// sequence of statements possibly including local variable defs
// and use of the variable a
}...
}
Listing 2.2: A generic foreach loop.
Fortunately, generalization to arbitrary loop bodies is a natural extension of the tech-
niques for the parallelization of simple foreach loops developed in the previous section.
The loop can be conceptually rewritten as shown in Listing 2.3.
class Foo {...int a;foreach (T elem in collection) {
elem.loopBody(this,a);}...
}
Listing 2.3: A generic foreach loop with its body abstracted to a method on theelement.
The loop body conceptually becomes a method on the element type T as shown in
Listing 2.4. Note that if such a rewrite were to actually be undertaken there could be
problems with access to private fields and methods of the class originally containing
the loop body.
As previously noted, it is not necessary to actually perform this transformation; only to
compute effects and perform dependency analysis as if it had been performed (details
will be discussed when the application of these techniques to a real language and the
implementation of this extended language are discussed).
Chapter 2. Reasoning About Parallelism 35
class T {public void loopBody(Foo me, int a) {
// same sequence of statements replacing
// all elem by this and all this by me
}}
Listing 2.4: The body of a generic foreach loop extracted as a method on the elementtype T.
2.4.1.2 Enhancing the Foreach Loop
The techniques developed so far work only on data parallel loops in the form of a
foreach loop. These techniques cannot handle arbitrary loops as they provide no
means of associating iterations with distinct data elements. There are, however, some
loops expressed using other looping constructs which are data parallel in nature and
could conceptually be converted to foreach loops. This is not done in many cases
due to the semantic restrictions of foreach loops. Specifically, foreach loops provide
access to each element in a collection, but do not allow for the elements of a collection
to be updated in place. Further, foreach loops do not provide access to the index of
an element within a collection. An example of a for loop which cannot be re-written
as a standard foreach loop for these reasons is shown in Listing 2.5.
for (int i = 0; i < list.Count; ++i) {list[i] = func(i, list[i], ...);
}
Listing 2.5: A for loop whose body updates elements of a collection in place and makesuse of the index of elements in the collection.
To support such cases, I have extended the syntax and semantics for foreach loops
over collections to support indexing and in place element updates to address these
problems. Listing 2.6 shows the syntax for expressing the loop shown in Listing 2.5 as
an enhanced foreach loop.
foreach (ref ElementType e at int i in list) {e = func(i, e, ...);
}
Listing 2.6: An example of the syntax for an enhanced foreach loop equivalent to thefor loop in Listing 2.5.
The two new syntactic features introduced to expose the enhanced functionality of the
foreach loop are the ref keyword and the at <type> <identifier> construct. It is
36 Chapter 2. Reasoning About Parallelism
important to note that these two new pieces of syntax are optional; either or both may
be omitted. In the case where both are omitted, the enhanced foreach loop operates
like a standard foreach loop.
The ref keyword before the element type indicates that assignment to e should be
treated as an in place update of the element in the collection. If ref is omitted, e acts
like the iteration variable in a standard foreach loop — assignment to the variable
updates only the variable and not the collection itself.
The at <type> <identifier> provides a means of exposing an index of some kind to
be associated with the elements being retrieved from the collection. The details of how
to implement such a loop are deferred to Chapter 6.
2.4.1.3 The Enhanced Foreach Loop
The enhanced foreach loop retains the declarative style of the regular foreach loop.
There are two possible differences: the write-through update of the collection denoted
by the ref keyword, and the presence of an index of iteration variable. As was done
with standard foreach loops, begin by considering the trivial enhanced foreach loop
shown in Listing 2.7.
foreach (ref Elem e at Index i in collection)e.operation(i);
Listing 2.7: A simple enhanced foreach loop used to develop sufficient conditions forparallelism which can be generalized to all enhanced foreach loops.
It is interesting to note that the enhanced foreach loop can be parallelized using the
same sufficient conditions as the standard foreach loop. The index cannot be a source
of a loop carried dependence due to Loops Conditions 2 and 3. Also note that there
is no requirement that the index values be unique to the iteration, provided the index
values are not modified (this is due to Loop Condition 3). For example, the index could
be a key in a dictionary data structure; multiple values could have the same key.
As with the standard foreach loop, if any of the conditions is known not to hold, then
the original sequential loop must be executed to preserve program correctness.
Chapter 2. Reasoning About Parallelism 37
2.4.1.4 Loop Rewriting
The foreach loop and the enhanced foreach loops considered in this section are declar-
ative in nature. The loop header explicitly identifies the collection the loop is operating
on. With this information, the side-effects of the loop body can be analyzed to try to
classify the operations the loop body is applying to the collection: map, reduce, and/or
filter. When a loop body is identified as a mapping operation, the sufficient conditions
just developed can be applied to determine if there is some exploitable data parallelism.
It may be determined, through analysis, that a loop body is actually a combination of
a mapping operation with some other operations such as reduction and filtering. Each
of these different operations could be a source of different kinds of parallelism and
would need to be parallelized differently, if at all. It may, therefore, be advantageous
to rewrite the foreach or enhanced foreach loop as a sequence of loops applying
individual operations, such as map, reduce, and filter, to the elements of the collection.
Doing so would make it easier to parallelize the loop. This rewriting process could, in
theory, be automated, but would require sophisticated semantic analysis and is beyond
the scope of the current work. A programmer could perform the rewriting and obtain
the benefits for important loops in the application.
Unfortunately, not all looping constructs in mainstream imperative programming lan-
guages are as declarative in nature as foreach loops. for and while loops, for example,
do not make explicit the collections the loop is processing. Identifying the collection
being processed allows the operations being performed in the loop body to be classified
and appropriately parallelized.
It may be possible to perform an analysis on an arbitrary loop body to identify the
collection being traversed, if any, and then rewrite the loop as a foreach or enhanced
foreach loop enumerating the elements of the collection. Automating this analysis
and loop rewriting in the general case would be difficult due to the expressiveness and
flexibility of looping constructs, such as the for and while loops found in modern
imperative languages. Such automation is beyond the scope of this work. However,
it is possible that a programmer could rewrite many common for and while loops as
foreach or enhanced foreach loops and these could then be subjected to the paral-
lelization analyses previously discussed.
38 Chapter 2. Reasoning About Parallelism
2.4.2 Pipelining
The data parallelism pattern for loop parallelization can be applied only to loops with-
out inter-iteration dependencies. In this section, I look at another style of data par-
allelism in the form of loop pipelining. Loop pipelining is a technique for staging the
execution of loop iterations so that no one stream of execution needs to run an entire
iteration. This can allow iterations of some loops, which modify shared mutable state,
to be partially overlapped. In this section, I also explore the pipelining of foreach and
enhanced foreach loops because of their declarative nature. As was the case with data
parallelism, it may be possible to rewrite for and while loops as enhanced foreach
loops to facilitate their pipelining.
2.4.2.1 Foreach Loops
As before, I begin by considering the pipelining of a traditional foreach loop. Consider
the stylized loop shown in Listing 2.8 where S A through S D represent statements or
groups of statements in the loop body.
foreach (ElemType elem in collection) {S A;S B;S C;S D;
}
Listing 2.8: A generic foreach loop body consisting of four statements.
The goal of pipelining would be to overlap the execution of parts of the loop body
shown in Listing 2.8 as shown in Figure 2.2. A pipeline stage is a single step in this
overlapped loop body execution. In the case of the example shown in Figure 2.2, S A
through S D are each pipeline stages.
For the loop to continue to produce the same results as it would if it were executed
sequentially, certain constraints need to be applied to the side-effects of the pipeline
stages to ensure that no inter- or intra-loop iteration dependencies are violated. To
help visualize some of the dependencies which are allowed to exist between the stages
of a pipelined loop, consider the graph shown in Figure 2.3. In this graph each node
represents the execution of a pipeline stage for a specific iteration of a loop. For
Chapter 2. Reasoning About Parallelism 39
Figure 2.2: The overlapping of the execution of loop iterations by using S A throughS D as pipeline stages. Notice how the iterations ripple through the different stages astime progresses.
example, the node in the top left corner of the diagram represents the execution of
the pipeline stage containing S A for iteration number 1. The arrows between the
boxes represent dependencies and the blue highlights show which stages of the different
iterations would execute together.
Figure 2.3: Graph to help visualize dependencies permitted between specific executionsof pipeline stages. S A through S D represent pipeline stages ordered from left to rightand loop iterations are arranged from top to bottom.
The question which naturally arises is how to construct a maximally pipelined im-
plementation of a loop from an arbitrary loop body without violating any inter- or
40 Chapter 2. Reasoning About Parallelism
intra-iteration dependencies. There are a number of existing techniques aimed at doing
just this [4]. These algorithms, which schedule loop bodies for pipelined execution, gen-
erally operate on some form of Data Dependency Graph (DDG). The precise details
of how dependencies are represented in the graph vary slightly from one scheduling
algorithm to another [4], but all require inter- and intra-iteration dependencies to be
identified. All of these DDGs can be built by determining dependencies between all
pairs of statements in the loop body. This means that the effect system I have proposed
can be used as a basis for the construction of a DDG suitable for the application of
these existing techniques.
Rather than presenting a set of sufficient conditions and algorithm for pipelining, as
was done for data parallel loops, I instead describe how dependencies are computed
and classified as inter- or intra-iteration dependencies so that an appropriate DDG can
be constructed for an existing pipelining algorithm. These procedures are presented in
an informal manner here and will be formalized and proved in subsequent chapters.
Dependency calculation for pipelining purposes begins in the same way as for data
parallel loops. The side-effects of the statements in the loop body are calculated using
the loop rewriting technique so that the this context refers to the representation of
the data element being processed. The following algorithm can be used to construct
a generic DDG for the purposes of loop pipelining provided the data elements being
processed by the loop are unique:
1. Consider all statements in the loop body pairwise (including each statement with
itself):
(a) Add an inter-iteration dependency if the two statements share a flow, out-
put or anti-dependence on the representation of any object other than the
data element being processed or any stack variable declared outside the loop
body’s scope.
(b) Otherwise, if the two statements are different, add an intra-iteration de-
pendency if the two statements share a flow, output or anti-dependence on
the representation of the data element being processed or any stack variable
declared inside the loop body’s scope.
Chapter 2. Reasoning About Parallelism 41
The DDG which results from this algorithm can then be specialized to suit a particular
pipelining algorithm.
2.4.2.2 Enhanced Foreach Loops
Having considered the pipelining of the traditional foreach loop, the question of
pipelining the enhanced foreach loop arises. The enhanced foreach loop retains
the declarative style of the standard foreach loop. There are two possible differences
between the standard and enhanced foreach loops: the write-through update of the
collection denoted by the ref keyword, and the presence of an index of iteration vari-
able.
Data dependency calculation for enhanced foreach loops proceeds in the same way as
for standard foreach loops. The index can be treated like any other piece of shared
mutable state and the in place update of an element of the collection is treated like any
other update of shared mutable state which creates an inter-iteration dependency.
2.5 Summary
At the beginning of this chapter, I discussed the complications which arise when trying
to apply traditional data dependency calculation techniques to large general purpose
programs written using modern imperative object-oriented languages. The most sig-
nificant complication is the difficulty of performing may-alias analyses due to a lack of
language support. Because of these complications, I advocate modifying the language
to supply the required support.
Adding side-effects to method signatures makes dependency analysis possible if the
effect system used to describe the effects is correctly chosen. I have advocated an
approach of using the object representation hierarchy in the program to describe effects
in an abstract and composable manner. Under such a system, naming an object as being
read or written implicitly includes all the objects in its representation as well. I then
demonstrated that it is possible to compute the possible existence of data dependencies
by calculating the intersections of the effect sets of code fragments; this allows data
dependencies between arbitrary code blocks to be computed.
42 Chapter 2. Reasoning About Parallelism
Using the effect system and dependency detection techniques, I developed sufficient con-
ditions for the application of a number of different parallelizing code transformations.
The patterns discussed included task parallelism, data parallelism, and loop pipelining.
The loop centric patterns were formulated for the declarative foreach loop found in
most modern imperative object-oriented languages. I also proposed an enhanced ver-
sion of the loop which would allow more for and while loops to be re-written using
a more declarative enhanced foreach loop construct. Doing so, allows my techniques
to be applied to these loops to help find inherent parallelism which can be exploited
safely.
The ideas presented in this chapter were presented in an abstract form and together
form the core of this thesis. Subsequent chapters will refine how the effect system can
be realized as a language extension, the sufficient conditions to operate in terms of
an actual effect system, and the application of the effect system to a real language.
Finally, a number of examples to validate these ideas will be presented.
Chapter 3
Realization of the Abstract
System
In the previous chapter, Chapter 2, I presented my basic ideas on how to reason about
side-effects and parallelism in an abstract and composable manner using the represen-
tation hierarchy present in object-oriented programs. The discussion in Chapter 2 was
high-level and abstract. Having proposed these ideas, the goal now is to refine these
ideas into a form where they can be applied to a real programming language in Chap-
ter 4. The goal of this exercise being to validate my ideas by applying them to a set of
representative sample applications.
The goal of this chapter is to explain how the abstract ideas in Chapter 2 can be realized
using ideas from Ownership Types. The realization in this chapter is specific to modern
imperative object-oriented languages, but is not tailored to a specific language.
This chapter is divided into several parts. I begin with a brief overview of Ownership
Types for readers not familiar with the work. This is followed by a demonstration of
how the ideas presented in Chapter 2 can be realized using Ownership Types. With
the representation hierarchy realized using Ownership Types, I then show how the
effect disjointness operations discussed in Chapter 2 can be expressed using Ownership
Types and effect declarations. The final part of the chapter discusses how the sufficient
conditions for task, data, and pipeline parallelism can be expressed in terms of the
realization proposed at the start of this chapter.
43
44 Chapter 3. Realization of the Abstract System
3.1 Encapsulation Enforcement
To be able to realize my proposal from Chapter 2, I need a means of tracking the
representation relationships between objects in a program. Much of the previous work
on tracking representation relationships originates from efforts in the software verifi-
cation community to track and enforce object encapsulation [57, 5]. As was shown in
Listing 1.3 in Chapter 1, consider the code snippet shown in Listing 3.1:
private Object[] signers;...public Object[] getSigners() { ...return signers; }
Listing 3.1: A code snippet showing the field and method signature implicated in theJava 1.1.1 getSigners bug.
As was discussed in Chapter 1, the private annotation on the signers field, shown
in Listing 3.1, it is possible for the getSigners method to return the object referenced
by this field. The private annotation on the field protects only the name of the field
and not the data it contains. This code was the source of the infamous getSigners
bug in Java 1.1.1 for precisely this reason [113].
The signers field and the object to which that field refers, in the above example,
is logically part of the object’s representation. Protecting an object’s representation
through access restriction and prohibition on external modification is called encapsu-
lation. Enforcing encapsulation can make it easier to write and debug object-oriented
programs [5, 57, 28]. Further, detecting encapsulation violations can help in identifying
potential bugs [113]. Because of these properties, encapsulation has been well studied
by the software validation and correctness communities. Some of this work has involved
the creation of type systems to validate or enforce encapsulation in modern imperative
object-oriented languages.
Validating or enforcing encapsulation generally requires each object’s position in the
representation hierarchy to be known. This can be achieved by having each object
track some combination of (1) which object’s representation it is part of and/or (2)
which objects are part of its representation. In Chapter 2, I proposed an effect system
based on the representation hierarchy present in object-oriented programs. A system
designed to check or enforce encapsulation could, therefore, also be used as a basis for
Chapter 3. Realization of the Abstract System 45
my proposed effect system which uses the representational hierarchy to facilitate effect
abstraction.
There are a number of different encapsulation tracking and enforcement systems doc-
umented in the literature (see Chapter 8 for more details), one popular branch of this
research is Ownership Types [28, 27, 25, 24, 49, 73, 74]. Ownership Types is a system
of type annotations used to track representation relationships [91, 28]. The effect sys-
tem I proposed in Chapter 2 could be realized using a number of these representation
tracking and enforcement systems, but I have chosen to use Ownership Types in this
thesis.
3.2 Ownership Types
There are a number of different kinds of ownership type systems. These different
systems have been designed for a number of different purposes, all related to program
validation. These type systems can be roughly divided into two groups: those which
derive from the original ownership formulation, using explicit ownership parameters, as
proposed by Clarke, Potter and Noble [28], and those which derive from the Universe
Types system proposed by Muller and Poetzsch-Heffter [88] which employ only relative
type annotations rather than explicit ownership parameters. I chose to base my effect
system on the systems employing explicit ownership parameters in the spirit of those
originally proposed by Clarke, Potter and Noble. This style of system captures more
detailed information than the relative ownership annotations found in Universe Types.
For a more detailed discussion of these different systems see Chapter 8.
In this section I present the ownership type and effect system I have developed to fa-
cilitate reasoning about parallelism. In this thesis, I am not claiming a contribution
of any new individual language features. Rather, my contribution is the unique com-
bination of these language features to facilitate reasoning about parallelism. There
are many different ownership systems documented in the literature which have been
proposed for many different purposes. The use of these systems for reasoning about
parallelism has been proposed, but not tested in the literature before this work. The
most closely related systems in the literature have focused on verifying the use of locks
46 Chapter 3. Realization of the Abstract System
in explicitly parallel programs. This is a very different problem from that of finding
inherent parallelism in a sequential program due to the simpler reasoning and infor-
mation tracking required. My contribution is using ownerships to detect and exploit
inherent parallelism.
In Ownership Types the tracking of representation is achieved through the use of ob-
ject contexts, hereafter referred to simply as contexts. As Clarke, Potter, and Noble
eloquently describe it, “Each object owns a context, and is owned by a context that it
resides within” [28]. An object stores the objects that are part of its representation in
its own context (referred to as the object’s this context). The objects in an object’s
this context are said to be owned by that object. Top-level objects not part of any
object’s representation are said to be owned by the special top context world.
Ownership and contexts are tracked by parameterizing the type declarations in the
system with context parameters. The first context parameter, by convention, is the
owning context. The other context parameters are used to construct types of other
objects used within the class which are not owned by the current object.
The ownership syntax I have used in my system draws upon that of several previous
ownership type systems proposed in the literature [27, 73, 49]. My system varies slightly
from these to suit my aims and to make it easier to parse the syntax when added to
languages which already make extensive use of most of the brackets available on the
standard keyboard. As an example of the ownership syntax I have chosen to employ,
consider the example of the simple stack implementation shown in Listing 3.2.
In Listing 3.2, three classes are declared: Object, Stack, and Element (which holds
one stack entry). The first context parameter on all of these types is the owner and
is the object whose representation an instance of the class belongs to. The second
parameter, present on the Stack and Element types, is the owner of the data stored in
the Stack. Without the second context parameter it would not be possible to construct
the type of the data field in the Element unless the data owner was assumed to be the
same as the owner of the stack. Notice that formal context parameters are specified in
[]s while actual context parameters are listed between ||s. This was done for clarity,
to simplify parsing, and to avoid ambiguity over the meanings of different kinds of
brackets. Notice, on the first line of the Stack class, how the elements of the stack are
Chapter 3. Realization of the Abstract System 47
public class Object[owner] {public bool equals[paramOwner](Object|paramOwner| other) { ... }...
}public class Stack[owner, dataOwner] {
private Element|this, dataOwner| top;public void push(Object|dataOwner| data) {
top = new Element|this, dataOwner|(top, data);}public Object|dataOwner| pop() {
Object|dataOwner| toReturn = top.getData();top = top.getNext();return toReturn;
}public bool contains[otherOwner](Object|otherOwner| other) {
for (Element|this, dataOwner| i = top; i != null; i = i.getData())if (i.getData().equals|otherOwner|(other))
return true;return false;
}}public class Element[owner, dataOwner] {
private Element|owner, dataOwner| next;private Object|dataOwner| data;public Element(Element|owner, dataOwner| next, Object|dataOwner| data){
this.next = next;this.data = data;
}public Object|dataOwner| getData() {
return data;}public Element|owner, dataOwner| getNext() {
return next;}
}
Listing 3.2: A simple stack implementation showing how I annotate classes with contextparameters.
owned by the Stack’s this context as they are part of its internal representation. The
owner of the element is recursively passed through the linked-list of elements.
I also allow methods to be parameterized with context parameters. This allows methods
to access data from contexts not directly referenced by the class. For example, in
Listing 3.2, the equals method on Object and the contains method on Stack both
take a type parameter that is used to allow the owner of the method’s parameter to vary.
This provides greater flexibility and reduces the number of context parameters required
on the average type declaration. It is possible that the actual context parameters for
a method invocation could be inferred from the type parameters as is done in C] with
generic type parameters [78]. This feature has not been implemented in my system to
48 Chapter 3. Realization of the Abstract System
date, but it could easily be added.
3.2.1 Generics
Most modern imperative object-oriented languages employ some kind of generics to
make it easier to write correctly typed programs. The interaction between generic type
parameters and context parameters has been heavily studied by the Ownership Types
community and a number of techniques for intermixing them have been proposed, each
with its own advantages and disadvantages [95]. In my ownership implementation, I
treat generic type parameters as orthogonal to the ownership context parameters. This
means that I do not mix the two annotations; when you construct a type it may have
both type parameters and context parameters as shown in Listing 3.3.
public class Example<T, S>[owner, otherContext] { ... }
Listing 3.3: An example of a class with both generic type parameters and contextparameters.
Generic Ownership [95] advocates a different approach in which ownership and type
parameters are intermixed to simplify program syntax. I chose an orthogonal approach
to simplify program parsing and facilitate retrofitting existing programs and program-
ming tools with ownership information. Further discussion of Generic Ownership and
other related work is undertaken in Chapter 8.
C], like Java, allows constraints to be placed on generic type parameters to limit the
actual types which may be supplied for a given formal type parameter [78]. I rely on
these type constraints to supply additional ownership information about generic type
parameters, including their associated context parameters.
Listing 3.4 shows a revised version of the stack implementation shown in Listing 3.2.
This implementation uses a generic type parameter to store the type of the data ele-
ments stored in the stack. It is interesting to note that with the introduction of the
generic type parameter T, it is no longer necessary to explicitly include the dataOwner
context parameter on the Element implementation since it does not access or manipu-
late the state of the data referenced by the element.
Listing 3.4 also serves as an example of how additonal ownership information can be
Chapter 3. Realization of the Abstract System 49
public class Object[owner] {public bool equals[paramOwner](Object|paramOwner| other) { ... }...
}public class Stack<T>[owner, dataOwner] where T : Object|dataOwner| {
private Element<T>|this| top;public void push(T data) {
top = new Element<T>|this|(top, data);}public T pop() {
T toReturn = top.getData();top = top.getNext();return toReturn;
}public bool contains[otherOwner](Object|otherOwner| other) {
for (Element<T>|this| i = top; i != null; i = i.getData())if (i.getData().equals|otherOwner|(other))
return true;return false;
}}public class Element<T>[owner] {
private Element<T>|owner| next;private T data;public Element(Element<T>|owner| next, T data) {
this.next = next;this.data = data;
}public T getData() {
return data;}public Element<T>|owner| getNext() {
return next;}
}
Listing 3.4: A simple generic stack implementation showing how I annotate classes withcontext parameters and how they interact with generic type parameters.
supplied for a generic type parameter using type constraints. The Stack class has a
constraint on its generic type parameter which allows the contents of the data elements
stored in the class to be accessed. This allows the contains method to be implemented
as shown.
3.2.2 Subtyping
Having added context parameters to the language, it is necessary to consider how
to handle these parameters when extending a class. I am not the first to consider
this problem; it has previously been explored in a number of other Ownership Types
systems [28, 27, 73]. It turns out that they can be handled in the same way as generic
50 Chapter 3. Realization of the Abstract System
type parameters. Formal context parameters in the subclass are mapped to formal
context parameters in the parent as part of the extension declaration. A child class may
have as many or as few context parameters as desired, just as is the case with generic
type parameters. There is, however, a requirement that the first context parameter,
the owner, remain the same between parent and child. This constraint is necessary to
guarantee consistency in ownership and facilitate enforcement of the invariants required
to reason about side-effects and parallelism. Listing 3.5 shows an example of how a
subclass’s context parameters are mapped to its parent’s context parameters.
public class Person[owner] {String|this| firstName;String|this| lastName;...
}
public class Employee[owner, company] : Person|owner| {Company|company| employer;...
}
public class Manager[owner, employeeOwner, company]: Employee|owner, company| {List<Employee|owner,company|>|this| employees;...
}
public class SelfEmployed[owner] : Manager|owner,this,this| {...
}
Listing 3.5: An example showing how a child class maps its formal context parametersto those of its parent.
In the code shown in Listing 3.5, note how a class’s formal context parameters are
mapped onto the formal context parameters of the class being extended. Child classes
can add additional context parameters, as demonstrated by the Employee and Manager
classes. A child class can also map parent context parameters to this or to the same
context parameter as demonstrated by the SelfEmployed class. Also of note is the
Employee list declaration, note that actual context parameters are supplied for both
the Employee type parameter and the List itself.
Chapter 3. Realization of the Abstract System 51
3.2.3 Type Compatibility
Whether a value of a given type can be used in a given position or not is dictated by a
language’s type compatibility rules. For example, the type of an assignment’s r-value
must be compatible with the type of its l-value. In an object-oriented language, as long
as the type of the r-value is equivalent to or a subtype of the l-value’s type then the
assignment is valid.
The addition of generic type parameters further complicate the rules for type com-
patibility. As an example of type compatibility consider the Node data type shown in
Listing 3.6. Logically, many programmers would expect given a type Parent and a
subtype of it called Child that a node holding a instance of Child could be used where
a node holding an instance of Parent is expected. Unfortunately, allowing this would
open a hole in the type system. Consider the code at the bottom of Listing 3.6 where
this is done. The assignment occurs, but the next line where a new node is created and
stored in the child’s next field is no longer correct. The parentEx variable allows the
assignment due to its type, but this would put a node holding a Parent into the next
field of a node only allowed to hold a reference to nodes holding a Child object. This
is inconsistent and causes the type system to break.
public class Node<T> {public Node<T> next;
}public class Parent { ... }public class Child : Parent { ... }
...Node<Parent> parentEx;Node<Child> childEx;...parentEx = childEx;parentEx.next = new List<Parent>(); // where variance causes a problem
Listing 3.6: An example showing how allowing variance in type parameters can createholes in the type system.
To solve this problem, most languages employing generic type parameters require strict
equality of type parameters when checking type consistency. The same problem that
occurs with generics can occur with context parameters and can be solved in the same
way. In my system, when checking types parameterized with context parameters, it is
52 Chapter 3. Realization of the Abstract System
first necessary to ensure the parameterized types are compatible. If generic type param-
eters are present they also need to be compared for compatibility once compatibility
of the parameterized types has been established. Assuming these checks succeed, the
context parameters then need to be checked for compatibility. This is done by checking
that context parameters in equivalent positions (taking in to account the parameter
mappings between the superclass and the subclass) have the same name. This is the
same as generic type parameter compatibility and prevents the hole in the type system
demonstrated above from occurring.
3.3 Side-effects Using Contexts
Having defined the operation of Ownership Types in my proposed system, it is now
necessary to consider how effects can be computed in terms of ownership contexts. This
is the key to realizing my ideas on how to reason about side-effects in an abstract and
composable manner.
In mainstream object-oriented programming languages, where objects are allocated on
the heap by default, modelling the operation of the heap is often enough to either en-
force or validate encapsulation. However, reasoning about dependencies and inherent
parallelism in a program requires both stack and heap effects to be considered as op-
erations on both can be sources of dependencies. Different imperative object-oriented
languages generally employ the same basic heap model; the entire heap is globally ac-
cessible throughout the program and allocation of heap memory is always permitted
as long as sufficient free memory exists. The consistent heap model means that the
operation of Ownership Types does not have to be significantly modified to operate on
different imperative object-oriented languages when being used to enforce or validate
encapsulation. Different language syntax and semantics do need to be accommodated,
but the basic operation of the ownership system does not need to change. Unlike the
heap model, the stack models employed by these languages vary greatly in the referenc-
ing and access models they support. Some languages, like Java and C], heavily restrict
access to the stack while others, like C++, do not.
In this thesis, I have chosen to focus on languages which employ a highly restricted
Chapter 3. Realization of the Abstract System 53
stack model. For the moment I assume that there are no references to objects residing
on the stack allowed. This will be loosened in Chapter 4, when I apply these techniques
to a specific language. Heap effects can, therefore, be reasoned about independently
from stack effects. This means that all operations which read or write fields cause heap
effects and all operations which read or write stack variables cause stack effects. The
statements that make up a method body may have both stack and heap side-effects.
Each method has its own activation frame on the stack when it is executing and it
uses this frame to store its stack variables. As a result, the lifetime of stack variables
is limited to, at most, the lifetime of the method call in which they occur. Because
there are no references into the stack, all of the stack variables read and written in
a method body will cease to exist when the method returns. As a result, methods
need only declare heap effects as part of their signature since their stack effects cannot
be observed once the method returns. The techniques for computing stack and heap
effects are discussed in the following subsections.
3.3.1 Heap Effects
I have decided to use ownership contexts to express heap effects in my effect system.
Because data dependencies do not exist simply because of memory reads (at least one
write to a location involved in a dependency is necessary) I decided to express heap
effects as a set of contexts read and a set of contexts written. This design decision helps
to reduce the number of false-positive dependencies detected by my system compared
to one where the read and write effect sets are not distinguished. In a system with
a unified effect set, having two effect sets containing the same context means that a
possible dependency between the code blocks in question would have to be assumed
even if they both just happened to read the same memory location.
All heap read and write effects in imperative object-oriented languages originate as
reads or writes of the fields of heap-allocated objects. The rules for computing the
effects of expressions and statements which may contain reads or writes of fields ensure
that these effects are preserved and incorporated into the computed effects. An example
of an effect computation rules for reading a field is shown in the rule below. As shown,
reading a field of the current object causes a read of the current object’s this context.
54 Chapter 3. Realization of the Abstract System
Notice that when a field belonging to another object is read, the owner of the object
whose field is read is added to the effect set. Because the this context always refers
to the current object’s context, the this context of other objects cannot be directly
named. The read of other object’s state is, therefore, abstracted as a read of the other
object’s owner. I call this process of abstracting effects raising.
` f :⟨〈this|∅〉 〈∅|∅〉
⟩e |= T ` e : ϕ
` e.f : ϕ ∪⟨〈owner
(T
)|∅〉 〈∅|∅〉
⟩The general form of an effect rule ` e :
⟨〈hr|hw〉 〈sr|sw〉
⟩can be read as an expression
e produces heap read and write effects hr and hw as well as stack read and write
effects of sr and sw (discussed in the next section) when evaluated. The e |= T
form means that the expression e has a type T , where T is a valid type. Finally, the
ϕ ∪⟨〈owner
(T
)|∅〉 〈∅|∅〉
⟩in the above rule is the union of two effect sets. When
two effect sets are unioned, the result is the smallest set of effects created from the two
original effect sets which includes all of the effects named in the two original sets.
When considering two arbitrary contexts, the relationship between two contexts can
be said to be one of
• equal (=) — they are one and the same
• dominating (<) — one context is directly or indirectly owned by the other
• disjoint (#) — they appear on different branches of the ownership tree
When unioning effect sets, it is desirable to eliminate contexts which are dominated by
other contexts in the same effect set to reduce the size of the set. It is also necessary
to ensure that no effect information is lost when such a union operation is performed.
To achieve these goals, the read and write sets of the two effects are unioned separately
using the algorithm shown in Listing 3.7.
Compound expressions which contain other expressions, such as the binary addition
expression shown in the rule below, ensure that the effects of evaluating the nested
Chapter 3. Realization of the Abstract System 55
union(set1, set2) :result := set2outer: for item1 in set1
for item2 in set2if item2 6 item1
result := result − item2if item1 < item2
next outerresult := result + item1
return result
Listing 3.7: Algorithm for unioning two sets of effects, set1 and set2. Note that the +and − operations are just set addition and subtraction.
expressions are included in the overall effects of the compound expression. This is done
by computing the effects of evaluating each of the component sub-expressions and then
unioning these effects using the algorithm just described. Note that if user-defined
operator overloading is permitted in the language to which this rule is applied, the
possible side-effects of the user-defined operation would also need to be included in the
effect set. The rule below shows an example of how the effects of a binary addition
operation are computed.
` e1 : ϕ ` e2 : ϕ′
` e1 + e2 : ϕ ∪ ϕ′
In the same way, a statement and a block of statements recursively ensure their com-
puted effect sets contain all of the effects of their constituent parts. The rule below
shows how the effects for a block of statements are calculated.
` s : ϕ `{s}
: ϕ′
`{s; s
}: ϕ ∪ ϕ′
`{}
:⟨
∅,∅⟩
As was discussed in Section 2.1, there are a number of complications which arise in
modern imperative object-oriented languages which can significantly complicate rea-
soning inter-procedurally about side-effects. My solution to this problem is to include
method read and write effects as part of the methods’ signatures. If the programmer
supplies an effect declaration then it is necessary to ensure that the effect of the method
body is either equal to or a subset of the declared effects so that the declaration can
be relied upon during dependency analysis.
56 Chapter 3. Realization of the Abstract System
Constructor effects are handled in a similar way to method effects, but with one major
difference. When a constructor is invoked to initialize a newly allocated object, only
the object’s constructor has a reference to the new object. The constructor may pass
this reference to other constructors and methods it invokes, but the reference can only
escape once the constructor has returned. As a result of this, reads and writes of the
this context can be omitted from constructor read and write effects.
Listing 3.8 shows a stack implementation similar to the one used previously to illustrate
ownership syntax in Listing 3.4. This example adds the effect annotation syntax for
methods and constructors.
public class Object[owner] {public bool equals[paramOwner](Object|paramOwner| other)
reads <this,paramOwner> writes <> { ... } ...}public class Stack<T>[owner, dataOwner] where T : Object|dataOwner| {
private Element<T>|this| top;public void push(T data) reads <this> writes <this> {
top = new Element<T>|this|(top, data);}public T pop() reads <this> writes <this> {
T toReturn = top.getData();top = top.getNext();return toReturn;
}public bool contains[otherOwner](Object|otherOwner| other)
reads <this, dataOwner, otherOwner> writes <> {for (Element<T>|this| i = top; i != null; i = i.getData())
if (i.getData().equals|otherOwner|(other))return true;
return false;}
}public class Element<T>[owner] {
private Element<T>|owner| next;private T data;public Element(Element<T>|owner| next, T data) reads <> writes <> {
this.next = next;this.data = data;
}public T getData() reads <this> writes <> {
return data;}public Element<T>|owner| getNext() reads <this> writes <> {
return next;}
}
Listing 3.8: A simple stack implementation illustrating method and constructor effectdeclarations. Method-level context parameters are specified using a notation inspiredby C]’s method-level generic type parameters.
Chapter 3. Realization of the Abstract System 57
There are a number of things to note from the example shown in Listing 3.8. The
Element constructor has empty read and write effects because it assigns values to only
the fields of the object being created. Normally, an assignment to an object’s fields
would generate a write of the this context, but because the only references to the
new object exist within the constructor, the effects are omitted to simplify dependence
analysis. The Element’s accessor methods read the this context as expected. The more
interesting example is the Stack’s contains method. The method calls the equals
method on the top Object, passing otherOwner as a context parameter. The overall
effect of the method is that it reads the stack (this), the representation of the data
items stored in the stack (dataOwner), and the representation of the method parameter
(otherOwner).
Note that effects can be described at different levels of abstraction due to the hierar-
chical nature of contexts, as was shown when discussing the computation of the effect
of reading a field. Side-effects in terms of contexts can be thought of as similar to
physical street addresses. An effect could be described as being limited to a very pre-
cise location, for example 5th Avenue, Manhattan. It would also be correct, but less
accurate, to say that the effect is located in New York City or indeed in the United
States. Effect descriptions can be abstracted to make them more inclusive. Such an
abstraction might be to make it easier to name an effect or to avoid exposing implemen-
tation details through the effect description. This abstraction and information hiding
comes at the cost of reduced effect accuracy. Extending this analogy to the problem of
reasoning about the overlap of effect sets, if we were then to observe an effect occurring
in Boston we would know that the effect described as being in New York and the effect
in Boston must be disjoint because they are in different cities. If, however, the effect
occurring in New York City were abstracted to be an effect occurring in the United
States it would no longer be possible to determine that it was disjoint from the effect
in Boston.
3.3.2 Stack Effects
Stack effects in my system are specified as a set of local variables and method parameters
that are read, and a set of locals that are written. In languages where local variables
58 Chapter 3. Realization of the Abstract System
can be shadowed, it is necessary to rename these variables so that they all have unique
names. With unique variable names, the unioning of two sets of stack effects is achieved
by a standard set union of the named elements.
A stack read effect is caused by the read of a local (a local variable or method parame-
ter). The type rule below shows the rule for reading a local named x. The type rules of
the language recursively ensure that all compound expressions and statements in the
same lexical scope as the local’s declaration preserve the local whenever it occurs in
the stack effects of any of their component expressions.
` x :⟨〈∅|∅〉 〈x|∅〉
⟩In most imperative object-oriented languages, like Java and C], local variables are
lexically scoped; they have a fixed life-time that is restricted to a written block of
program code. When the stack effects of a code block are calculated outer variables
are only included in the computed effect sets; inner variables are excluded since they
are not externally visible.
Finally, if a stack local contains a reference to an object on the heap, then reading a
field of the referenced object via the local causes both a stack and heap effect. A read
of a field via a local, causes a stack read effect of the variable and a heap read effect
of the owner of the object accessed. If a field is written to via a local, then the total
effect of the expression includes a read of the local and a write of the owning context
of the object modified.
3.4 Effect Disjointness
To determine if data dependencies are possible between arbitrary code blocks, it is
necessary to determine if their effects are disjoint or not. This section looks at how
disjointness is computed for heap and stack effects.
3.4.1 Heap Effect Disjointness
Consider how to determine the disjointness of two sets of heap effects; to do this, there
must be some basis to reason about the relationship between two contexts. Ideally,
Chapter 3. Realization of the Abstract System 59
this reasoning would be undertaken statically, that is before program execution. When
this is not possible, runtime testing of contexts to determine their relationships may be
desired to supplement the static reasoning. The construction of my ownership system
means that there are some statically known context relationships which apply to all
classes: (1) the world dominates all other contexts in the system and (2) the class’s
owner context dominates the class’s this context.
3.4.2 Facilitating Upwards Data Access
The ownership type system features discussed so far have focused on how to distinguish
between the representations of different objects. One important scenario not addressed
by the features discussed is how to allow more precise effects when reading shared
mutable state of an ancestor context (a context that directly or indirectly owns the
current context).
Figure 3.1 illustrates the ownership tree we would like to be able to support. We have
an object owned by a context c. From context c we wish to read data from context r.
If context r is not in scope (i.e. we cannot name it) then we must access r through
context b, an upward access.
Figure 3.1: ownership relationships between contexts at runtime used as an example ofcapturing context disjointness using sub-contexts.
A read of r processed by parent object b is likely to be summarized as a read of b if r is
not nameable. This abstraction would make a safe read of context r relative to context
c into a read of context b relative to context c which would be unsafe. To avoid this
problem, I introduced the notion of sub-contexts into my system to allow it to name
subparts of an object. In the above example, the context b could be partitioned into
sub-contexts; it would “own” a finite number of named sub-contexts b1 and b2. Using
60 Chapter 3. Realization of the Abstract System
these sub-contexts reading r could be summarized as a read of b1 rather than b itself.
If the context c is located in sub-context b2, then we could safely allow the read of b1
as it is disjoint from c.
Having decided to allow contexts to be partitioned into sub-contexts, the next questions
are, what are the scope of these sub-contexts and how can they be declared or created.
It is my belief that these sub-contexts represent groupings of logically related objects
and that the creation of these sub-contexts is a design choice based on the design of
the object in which they are declared. I have, therefore, decided to only allow the this
context in an object to be subdivided into sub-contexts. Further, because the design
of sub-contexts is related to the implementation details of the object they are part of,
allowing other classes to name them directly would cause implementation exposure. The
ability to directly name an object’s sub-contexts is, therefore, restricted to the object
which declared them. This means that the sub-contexts of b cannot be named directly
from context c. The sub-contexts of b can be passed to c via context parameters and
so the read of r can still be summarized as a read of b1 despite the naming restrictions.
This would allow the read of context r to be safe provided that c is not in the same
sub-context as r.
All sub-contexts are dominated by their containing object’s this context. In addition
to their use with types, I also allow sub-contexts to appear in method read and write
effect sets. Sub-contexts named as effects are abstracted in the same way as the this
context of the object they are part of. This allows more precise effect information to
be captured when desired.
Within each class, the programmer can decide if they wish to declare sub-contexts and
if they do, they can declare as few or as many as they desire. In the extreme case,
each field might be given its own sub-context, but programmers would more commonly
create a sub-context to encapsulate a group of related fields. The more sub-contexts, the
more information that needs to be passed as context arguments on types; the creation of
sub-contexts is a trade-off between precision and complexity. Sub-contexts are limited
in scope to their class of declaration. To objects that are part of the sub-context’s
representation, the sub-context looks like any other context parameter, while to the
contexts which contain the sub-contexts in their representation, they are no different
Chapter 3. Realization of the Abstract System 61
than any other context. This limits the scope of the changes required to introduce the
sub-contexts to the class scope.
The idea of sub-contexts has been presented previously by other authors. Aldrich and
Chambers proposed a type system called Ownership Domains which made extensive
use of sub-contexts in the form of user declared domains in which objects stored their
representations [3]. Clarke and Drossopoulou used them for other purposes in JOE to
provide more precise effect information [27].
3.4.3 Context Constraints
Programmers may know, when designing a class or method that a specific relationship
should hold between two or more context parameters. Capturing this information
as part of the program provides more information which could be used to statically
determine the disjointness of side-effects in terms of contexts. One way to allow the
programmer to specify these relationships would be to allow them to specify constraints
on the relationship between the two context parameters. So rather than allowing any
context parameter(s) to be supplied, the compiler would require the contexts supplied
to satisfy the stipulated relationship constraints or the program would be invalid and
would fail to compile.
Syntactically, these constraints could take a similar form to the statically enforced
constraints that can be applied to generic type parameters in languages like Java and
C]. I have chosen to use a constraint syntax inspired by that used by C] for its generic
type parameter constraints. I allow for four possible relationships to be stipulated
between two context parameters: dominates(>), dominated (<), dominated or equal
(<=), and disjoint (#). A context dominates another if it is directly or indirectly an
owner of the second context. A context is dominated by another if it is directly or
indirectly owned by the second context. Two contexts are disjoint if neither is part
of the other’s representation. The Listing 3.9 shows examples of this syntax. In the
listing a is stipulated to be disjoint from b and c is stipulated to be dominated by a.
Note that the same syntax is used for constraints on both classes and methods.
62 Chapter 3. Realization of the Abstract System
public class TestClass[owner, a, b, c] where a # b, c < a {...
}
Listing 3.9: An example of context constraint syntax on a class with context parameters.
3.4.4 Runtime Ownership Tracking
The statically enforced relationship constraints, introduced in the previous subsection,
allow programmers to document design decisions about the relationships between data
in their data structures. In practice, however, there are cases where the actual rela-
tionship between contexts is not known by the programmer or compiler before program
execution. Examples of this can be found in a number of common design patterns,
including iterator and visitor patterns. Most commonly, this problem occurs when
the contexts being tested are both context parameters, either on a class or method
declaration, without any context constraints. Any context could be supplied so the
relationship between the context parameters cannot be determined in the absence of
constraints. In cases like this, there are times when it would be desirable to be able
to test the relationship between context parameters at runtime. Such a runtime test
i needed to allow code to be conditionally parallelized based on possible context rela-
tionships. This conditional parallelization capability increases the amount of inherent
parallelism that can be successfully exploited by my proposed system. It is important
to note that, even if the code to be parallelized could touch millions of memory loca-
tions, a small fixed number of context tests can be used to quickly determine if these
memory locations are disjoint or not.
One of the simplest ways to implement a runtime ownership tracking system would be
to have each object in the system keep a pointer to its owner. In such a scheme the this
context of an object is represented as the object itself and, when present, subcontexts
are realized using objects. The ownership pointers added to objects would make it pos-
sible to traverse the ownership tree at runtime to determine the relationship between
contexts through pointer chasing. Runtime parallelism conditions most frequently in-
volve testing if two contexts are disjoint. To perform such a test at runtime using this
system, a walk from each of the nodes to the root (the world context) is performed.
The two contexts are disjoint if neither of the walks contain either of the contexts being
Chapter 3. Realization of the Abstract System 63
tested. If objects kept both a pointer to their owner and their depth in the ownership
tree, then testing for disjointness could be done in O(n−m) comparisons (where n and
m are the depths of the two contexts being compared in the ownership tree).
This simple implementation is O(1) per object in terms of memory complexity and
runtime object creation overhead. However, executing a context relationship test using
the representation is O(n) in terms of execution time complexity, where n is the height
of the ownership tree. Although most ownership trees tend to be relatively shallow
with a maximum height of approximately eight [1]. This means that in cases where
large numbers of objects are created relative to the number of disjointness tests, this
tracking system works well. In cases where the number of disjointness tests exceeds
the number of objects created, this ownership tracking system may not offer the best
performance.
The problem of tracking ownership and testing context relationships is analogous to
the type inclusion test problem described by Wirth [124] and efficiently solved by Co-
hen [30]. The type inclusion test problem is the problem of how to test if one type is a
subtype of another when compiling a program. Cohen’s solution to this problem was
to use Dijkstra views [39] to trade memory to reduce execution time complexity. This
solution can be applied to the problem of tracking and testing context relationships.
To use Dijkstra views to track ownerships, each object stores an array of pointers to
all of their ancestor contexts back to the root context, world. A disjointness test can
then be performed using the algorithm shown in Listing 3.10.
if |obj1.ancestors| == |obj2.ancestors|return obj1 != obj2
else if |obj1.ancestors| < |obj2.ancestors|return obj2.ancestors[|obj1.ancestors|] != obj1
elsereturn obj1.ancestors[|obj2.ancestors|] != obj2
Listing 3.10: The algorithm for testing if two contexts are disjoint. Each object hasa list of ancestor contexts which can be indexed into using []. The || operator has itsusual mathematical meaning of magnitude.
The Dijkstra views based algorithm provides an execution time complexity for testing
the relationship of O(1). This is at the cost of an object creation runtime complexity
of O(n) and a memory complexity of O(n), where n is the height of the ownership
tree. An intermediate trade-off would be possible if the ancestor array were replaced
64 Chapter 3. Realization of the Abstract System
with a skip list [96] or similar data structure which would provide execution time and
memory complexity of O(log n), where n is the height of the ownership tree. Table 3.1
summarizes the object creation execution time and memory overhead as well as the re-
lationship test execution time complexities for these three different methods of tracking
ownerships at runtime.
Implementation Object Creation Object Memory Relationship TestExecution Time Overhead Execution Time
Pointer Chasing O(1) (1) O(n)Dijkstra Views O(n) O(n) O(1)Skip Lists etc O(log(n)) O(log(n)) O(log(n))
Table 3.1: Table showing the runtime complexity of object creation and relationshiptesting. Note that n is the height of the ownership tree.
Given the overheads involved with such a runtime system, there would likely need to
be some kind of switch on the virtual machine to enable/disable the use of the runtime
tracking system so that the overheads incurred can be avoided in cases where exploiting
conditional parallelism is not worthwhile. When the runtime system is turned off, all
disjointness tests would fail and only parallelism shown to exist statically is exploited.
As future work, it may be possible to implement a system where the runtime tracking
of ownerships is limited to a few classes with the class loader containing intelligence to
decide which classes should have the ownership tracking enabled. This would provide
the best-of-both-worlds in that the system could exploit conditional parallelism, but
avoid overheads on sequential code to a large degree. The details of such a system are
beyond the scope of the work in this thesis, but are a direct and logical extension of
the work presented here in.
3.4.5 Stack Effect Disjointness
Determining the disjointness of stack effects is significantly less complicated than de-
termining the disjointness of heap effects. The stack and heap are treated as separate
disjoint memories by my system, as previously discussed. The simplicity of stack ef-
fects stems from their lack of a hierarchy. To process stack effects I rely on each stack
location in scope having a unique name. In languages which permit shadowing of local
variables, my system would require local variable names to be renamed so that they
Chapter 3. Realization of the Abstract System 65
are unique. With unique names for stack locations, two effects are disjoint if they do
not name the same local variable or parameter. To determine if two code blocks can
execute in parallel there are three tests to perform their stack effects,⟨r1, w1
⟩and⟨
r2, w2
⟩:
1. r1 ∩ w2 = ∅
2. r2 ∩ w1 = ∅
3. w1 ∩ w2 = ∅
Any overlap in these effects indicates a data dependency could exist between the two
code blocks and so my system conservatively concludes that they cannot be safely
executed in parallel.
3.5 Realizing the Sufficient Conditions for Parallelism
Having now realized the effect system proposed in Chapter 2 using Ownership Types, it
remains to map the sufficient conditions for parallelism proposed onto this effect system.
The previous chapter looked at three different parallelism patterns: task parallelism,
data parallel loops, and pipelining. This section is divided into three parts which look
at each of these parallelism patterns and refine the statements made in Chapter 2.
3.5.1 Task Parallelism
In Section 2.3, I stated that for two tasks to be safely executed in parallel, it was
sufficient to show that no flow, output, or anti-dependencies existed between the tasks
provided no control dependencies which would prohibit parallelization existed. It is
sufficient to show the following for each type of data dependency to show that it cannot
exist [15]:
• Flow Dependency the read effects of a statement S1 are disjoint from the write
effects of a statement S2;
• Output Dependency the write effects of a statement S1 are disjoint from the
write effects of a statement S2; and
66 Chapter 3. Realization of the Abstract System
• Anti-dependency the write effects of a statement S1 are disjoint from the read
effects of a statement S2.
3.5.2 Data Parallel Loops
In the data parallelism pattern, loop iterations execute independently and are dis-
tributed across multiple processors [9]. In Section 2.4.1.1 the sufficient conditions for
employing this parallelism pattern on foreach loops were developed. In Section 2.4.1.3,
I argued that the same sufficient conditions could also be used to apply this parallelism
pattern safely to the enhanced foreach loops I developed. In this section, these suffi-
cient conditions are refined in terms of the ownership types system.
The conditions originally presented in Section 2.4.1.1 were formulated by considering
a simple foreach loop of the form shown in Listing 3.11 and generalized using a con-
ceptual loop rewriting.
foreach (T element in collection)element.operation();
Listing 3.11: A simple stereotypical data parallel foreach loop.
The sufficient conditions developed for the parallelization of data parallel loops were:
• Loop Condition 1: there are no control dependencies which would prevent loop
parallelization,
• Loop Condition 2: the elements enumerated by the iterator supplying elements
to the loop body must have disjoint representations, and
• Loop Condition 3: the operation mutates only the representation of the ele-
ment on which it is invoked and it does not read the representation of any other
elements in the collection although it can read other, disjoint shared state.
Now using the ownership type and effect system I have presented in this chapter, these
conditions can be restated as:
• Loop Condition 2: all of the elements processed by the loop body must have
disjoint this contexts and must share the same owner, and
Chapter 3. Realization of the Abstract System 67
• Loop Condition 3: the operation has a write effect of at most this and all read
effects are either dominated by this or are disjoint from the owner of elements.
No writes to local variables declared outside the scope of the loop are permitted.
3.5.3 Pipelining
The data parallelism pattern for loop parallelization can be applied only to loops with-
out inter-iteration dependencies. Loop pipelining is a technique for staging the exe-
cution of loop iterations so that no one stream of execution needs to run an entire
iteration. This allows parallelism to be extracted from some loops which have inter-
iteration dependencies. In Chapter 2, I outlined an algorithm for computing the inter-
and intra-iteration dependencies for the statements in an arbitrary loop body provided
the data elements being processed are unique:
1. Consider all statements in the loop body pairwise (including each statement with
itself):
(a) Add an inter-iteration dependency if the two statements share a flow, out-
put or anti-dependence on the representation of any object other than the
data element being processed or any stack variable declared outside the loop
body’s scope.
(b) Otherwise, if the two statements are different, add an intra-iteration de-
pendency if the two statements share a flow, output or anti-dependence on
the representation of the data element being processed or any stack variable
declared inside the loop body’s scope.
Using my ownership type and effect system, this algorithm can be restated more for-
mally as:
1. Consider all statements in the loop body pairwise (including each statement with
itself):
(a) Add an inter-iteration dependency if either statement writes to a context
which is not dominated by the element’s this context or any stack variable
declared outside the loop body’s scope.
68 Chapter 3. Realization of the Abstract System
(b) Otherwise, if the two statements are different, add an intra-iteration depen-
dency if a flow, output, or anti-dependency exists between the two state-
ments via a context dominated by or equal to this context or any stack
variable declared inside the loop body’s scope.
The dependencies computed using this algorithm can then be used to construct a DDG
for a specific pipelining algorithm. This algorithm will be further formalized and proved
in Chapter 5.
3.6 Summary
In this chapter I have shown how the effects system described in Chapter 2 can be
realized using Ownership Types. Ownership Types provides a framework for capturing
and validating representation relationships between objects using context parameters.
Effects can be expressed in terms of these ownership contexts and these can in turn be
used to reason about possible data dependencies by determining the overlap between
different effect sets. To parallelize code, it is necessary to have a basis for determining
the disjointness of contexts and proving the absence of data dependencies. There are
some context relationships intrinsic to the construction of my system which can be
exploited, but to further facilitate reasoning about the disjointness of effects, I have
also added static context relationship constraints and an optional runtime ownership
representation. These techniques will be applied to the C] language in the next chapter
and this language specific implementation of my ideas will be used to establish the
viability of my approach through the application of my ideas to representative sample
applications in Chapter 7.
Chapter 4
Application to an Existing
Language
Chapter 2 presented the basic ideas and intuition behind my proposed system and in
Chapter 3, I realized these ideas using ideas from Ownership Types. This development
was originally undertaken targeting a C-style object-oriented language, but without
a specific language in mind. To validate my proposals, I need to apply them to a
representative set of sample applications. To be able to do this, I need to apply my
proposed system for capturing side-effects and reasoning about data dependencies to a
real programming language. This chapter focuses on this application and the specific
technical issues encountered in doing so.
This chapter focuses on the application of my ideas to the syntax and semantics of
a real, mainstream, commercial programming language; namely the safe subset of C]
version 3.0 (the reasons for this choice are discussed in Section 4.2). I do not consider
how to apply my effect system to C]’s unsafe code blocks, except to say that all unsafe
blocks are assumed to read and write world so that no parallelization involving that
part of the code is performed. Possible avenues of future work which might allow my
techniques to be extended to the unsafe subset of C] are in Section 9.3.4.2.
This chapter contributes the design of an extended version of C] version 3.0 I have
called Zal (Zal meaning “dawn” in Sumerian, the earliest known written language).
The details of a compiler for and the design choices made while implementing that
69
70 Chapter 4. Application to an Existing Language
compiler are the focus of Chapter 6. The focus of this chapter is on how to apply my
proposed system to all of the different syntactic and semantic features found in the C].
Ownership Types have traditionally been applied to the Java programming language
so there are a number of small technical contributions made through my discussion of
how to apply Ownership Types to these constructs.
4.1 Language Overview
As previously stated, Zal is an extension of the C] programming language with own-
ership and effect information. Zal consists of syntactic extensions to C] version 3.0, a
parallelizing compiler for the language, and a runtime ownership system. Figure 4.1
shows how these different components are used to compile a program and exploit its
inherent parallelism at runtime.
Figure 4.1: The different components that are used to compile a Zal program andexploit its inherent parallelism at runtime.
During program compilation, the Zal compiler applies the sufficient conditions for par-
allelism informally stated in Chapter 2, where appropriate. There are three possible
results:(1) the program fragment analyzed cannot be safely parallelized, (2) the program
fragment can always be safely analyzed, or (3) the program may be safely parallelized
depending on the relationships between context parameters at runtime. In the third
case, the compiler generates conditionally parallel code; at runtime the relationships
between contexts are established to determine if parallelism can be safely employed.
As an example of how Zal works and runtime context relationship testing, consider the
Chapter 4. Application to an Existing Language 71
hashtable example shown in Listing 4.1.
public class HashTable<K,V>[o,k,v] where K <: Object|k|where V <: Object|v| {private K[]|this| keys;private V[]|this|[]|this| values;public void Accept(Visitor<V>|v| visitor) reads <this,v> writes <v> {
foreach (Key|k| key : keys) {foreach (Value|v| value : values[key.HashCode])
visitor.Visit(value);}
}}
Listing 4.1: An example of a hashtable which implements a visitor interface whichallows the values to be traversed in parallel provided the k and v contexts are disjoint.
In the hashtable example shown in Listing 4.1, the hashtable has three ownership
parameters: the owner of the hashtable (o), the owning context of the hashtable keys
(k), and the owning context of the hashtable values (v). The traversal of the values can
only be safely undertaken when k is disjoint from v; otherwise, the Visit method could
modify the values of the keys and so disrupt the traversal of the hashtable. Listing 4.2
shows the conditionally parallel code generated by the Zal compiler for the hashtable
example shown previously in Listing 4.1.
[FormalContextParameters("o", "k", "v")]public class Hashtable<K,V> {
private IOwnership Context o;private IOwnership Context k;private IOwnership Context v;private K[] keys;private V[][] values;
[ReadEffects("this", "v"), WriteEffects("v")]public void Accept(Visitor<V> visitor) {
if ( Context k.IsDisjointFrom( Context v)) {Parallel.Foreach (keys, (Key key) => {
Parallel.Foreach (values[key.HashCode] ,(Value value) => visitor.Visit(Value));
});} else {
foreach (Key key : keys) {foreach (Value value : values[key.HashCode])
visitor.Visit(value);}
}}
}
Listing 4.2: The C] implementation of the hashtable example shown in Listing 4.1.
72 Chapter 4. Application to an Existing Language
Zal also allows programmers to statically constrain the relationships between con-
text parameters on declarations. The compiler verifies that these static context con-
straints are satisfied. Adding the constraint where k # v to the list of constraints on
the hash table shown in Listing 4.1 would allow the compiler to generate only the
Parallel.ForEach implementation of the loop. These constraints restrict the context
parameters that can be accepted by a type or method, but allow the programmer to
capture design intent in their programs.
In Listing 4.2, notice how each of the context parameters has been translated into a
field; full details of this implementation of ownership tracking is provided in Chap-
ter 6. Also notice that the traversal of the hashtable’s values has been conditionally
parallelized. When the compiler undertakes parallelism analysis, it unconditionally
parallelizes as much of the program as possible. When unconditional parallelism is
not possible because of the relationship between two or more context parameters is
unknown, the compiler emits a conditionally parallelized implementation; the parallel
implementation is used only when it is safe to do so.
Zal’s ownership system is best described as a descriptive hierarchical ownership system
with effects. This means that in the Zal ownership system, each object has a single,
immutable owner and that the side-effects of executable definitions are expressed using
the ownerhsip system. This is broadly similar to the systems underlying MOJO [25]
and Deterministic Parallel Java [18] as well as the predecessors of these languages.
My proposed system also incorporates a number of ideas from a number of existing
ownership systems including JoE [27] and Universe Types [88].
The novel distinguishing features of Zal are:
1. more expressive context relationship constraint clauses
2. dynamic ownership tracking with runtime context relationship checks
3. the handling of a number of different syntactic features including:
(a) static fields
(b) user-defined value types
(c) inline function declarations (lambda expressions)
Chapter 4. Application to an Existing Language 73
(d) LINQ expressions
The dynamic ownership tracking and the syntactic features highlighted above are novel.
Context relationship constraint clauses have previously been proposed in JoE [27]
and MOJO [25]. JoE’s ownership constraint clauses had only a domination opera-
tor (<) [27]. MOJO proposed a disjointness operator and an intersection operator, but
did not include the domination operator [25]. Zal, therefore, builds on a number of
existing ownership systems and adapts ideas from them for reasoning about possible
data dependencies in C].
4.2 Choice of Language
Before proceeding to discuss the details of applying Ownership Types to the C] pro-
gramming language, a discussion of the reasons for choosing it as the base language is
warranted. As was stated in Section 1.2, this thesis is focused on reasoning about par-
allelism in mainstream strong, statically-typed, imperative, object-oriented languages.
This focus limits the choice of language, but there were still several alternatives to
consider.
The first consideration was that this thesis is focused on developing techniques for
reasoning about parallelism which can be used by a large proportion of programmers
to write general purpose programs of varying sizes. This removed many specialist and
research languages, such as Scala, from consideration.
Secondly, as discussed in Section 3.3, my realization relied on a restricted stack refer-
encing model. This ruled out languages which do not employ a restricted stack model,
for example C++. This left Java, C], and Visual Basic.NET as the main languages for
consideration. Note that the “unsafe” portion of the C] language does not satisfy the
restricted stack referencing model adopted during the realization of the effect system.
Fortunately, unsafe code segments must be clearly identified in C] and require a spe-
cific compiler flag to be set. As a result, most C] programs do not make use of these
“unsafe” features and so they satisfy the adopted model. C] has the added advantage
of making it easy to explore expanding the system to less constrained stack models,
74 Chapter 4. Application to an Existing Language
through its unsafe language features, in the future (see Section 9.3.4.1 for a discussion
of this future work).
Out of the three remaining languages (Java, C], and Visual Basic.NET), the best
studied in existing academic literature at the time was Java. The use of Java in the
literature means that there is a large collection of academic benchmarking suites. Own-
ership type systems have generally been validated using extended versions of the Java
programming language. Java also has the advantage of having several well studied,
free, open-source implementations available.
Despite Java’s advantages, it also has some serious disadvantages. First, and most
serious, is the retrofitted generics implementation in Java. Generics allow user-defined
types to be parameterized with additional types so that objects can be specialized
at creation time. When generics were added to the Java programming language, the
designers chose to maintain backwards compatibility with previous versions of the lan-
guage in the virtual machine. This meant that the generics were implemented using
type erasure; the generic type information is erased by the compiler and it is not avail-
able at runtime. This is not ideal because the ownership type parameters required
by my realization would be implemented using the same infrastructure as the generic
type parameters. As discussed in Section 3.4.4, a runtime ownership tracking system
is necessary to help facilitate reasoning about the disjointness of contexts.
C] and Visual Basic.NET, unlike Java, both persist generic type information through
to the virtual machine at runtime. The official compilers for these languages are not
publically available, but the language specifications are freely accessible. C] does not
have the same support in the academic community as Java and so finding benchmarks
and other similar tools is more difficult, but there is a large number of programs written
in C]. The similarity of the core syntax used by Java and C] also allows for relatively
easy porting of existing Java programs to C].
Ultimately, I decided to work with C] because it offers a number of advanced language
features, such as LINQ and user-defined value types, not found in Java and because
it persists generic type information through to runtime. C] also boasts a familiar
syntax and a large following of programmers writing general purpose applications.
The existing C] expertice within the Microsoft QUT eResearch Centre and QUT’s
Chapter 4. Application to an Existing Language 75
support for research involving the language also influenced my choice of language. My
system could have been applied just as well to Java or Visual Basic.NET. My extended
version of the C] version 3.0 language developed as part of this thesis, Zal, is presented
informally in the remainder of this chapter and formalized in the next.
4.3 Syntactic Features
Ownership Types research has traditionally been undertaken using the Java program-
ming language. The C] programming language used in this project has a number of
syntactic constructs not found in Java. There is no literature documenting the ap-
plication of Ownership Types to C] itself or to some of its more advanced language
features.
All of the basic syntactic changes proposed in Chapter 3 can be applied directly to the
C] language. This section focuses on the annotation of the more advanced syntactic
language features, not previously discussed, with ownership and effect information.
4.3.1 Basic Syntax
In Chapter 3 I discussed how my proposed effect system could be realized using own-
ership types. This discussion focussed on classes and methods, the core constructs of
most object-oriented languages. In the course of this discussion I presented several dif-
ferent code samples using a syntax similar to that of Java and C]. The syntax presented
in these sections was not extensively discussed and the code snippets were primarily
used to provide clarifying examples for the discussion. Having decided to apply my
proposed type and effect system to C], I begin by revisiting changes required to classes
and methods to implement my proposed type and effect system. The remainder of this
chapter discusses the syntax and semantics of Zal.
4.3.1.1 Class-Level Context Parameters
Classes, in an ownership system, are parameterized with formal context parameters.
The first formal context parameter is special and is referred to as the class’s owner. An
76 Chapter 4. Application to an Existing Language
object’s owning context specifies which object’s representation it is part of. The other
formal context parameters are all used to construct types within the class definition.
In Zal, a method’s formal context parameters are listed as part of the class definition
immediately after the class name and any generic type parameters. To facilitate parsing,
I have chosen to delimit the list of formal context parameters using []s. Listing 4.3
shows a class parameterized with several different formal context parameters.
public class Example[owner, a, b] {...
}
Listing 4.3: An example of a class parameterized with formal context parameters owner,a, and b.
When a type is named, actual context parameters must be supplied for the formal
context parameters. The actual context parameters may be the special contexts this
(the context storing the representation of the current object) and world (the special
top context) or a context parameter currently in scope. The actual context parameters
on a type reference are listed between ||s immediately after the type name and any
generic type parameters. Listing 4.4 shows the naming of the type of a field within a
class using actual context parameters.
public class Example[owner, a, b] {public Example|this, world, a| next;
}
Listing 4.4: An example of a field with actual context parameters this, world, and b.
Last, but not least, when a class with context parameters is extended, actual context
parameters must be supplied on the type reference just as is done with generic type
parameters. An example of this is shown in Listing 4.5.
public class Parent[owner, a] { ... }public class Child[owner, data, source] : Parent|owner, source| { ... }
Listing 4.5: An example of a class extending a class which is parameterized with contextparameters.
4.3.1.2 Method-Level Context Parameters
Like some previous ownership systems [27, 25, 49], I allow method to be parameterized
with formal context parameters. Unlike the class formal context parameters, there is
Chapter 4. Application to an Existing Language 77
no special meaning attached to the first context parameter declared on a method. The
context parameters on a method allow the method to construct types that cannot be
constructed using the containing class’s context parameters alone. A method’s formal
context parameters, if any, are listed between []s following the method name and any
generic type parameters. An example of a method with formal context parameters
is shown in Listing 4.6. When a programmer invokes a method with formal con-
public class Example[owner, a] {public void method[data1,data2](Object|data| dataObj1,
Object|data2| dataObj2) {...
}}
Listing 4.6: A method definition with formal context parameters.
text parameters, they supply the actual context parameters explicitly at the call site.
The actual context parameters are listed in the same positions as the formal context
parameters, but between ||s.
When invoking a generic method in C], the compiler can often infer the type param-
eters from the types of the actual parameters supplied at the call site. Listing 4.7
shows an example of a generic method to concatenate the string representations of
two parameters which makes use of this type parameter inference. The same inference
techniques could be used to infer the actual context parameters to a method when
actual context parameters are not supplied by the programmer. This would reduce the
number of annotations that need to be supplied by the user without reducing the flex-
ibility or expressiveness of the system. The prototype implementation of Zal described
in Chapter 6 does not currently support such context parameter inference, but adding
this support is only a matter of additional engineering work in the compiler.
public string Concat<T>(T a, T b) {return a.ToString() + b.ToString();
}...int val1 = 1, val2 = 2;string result2 = Concat(val1, val2); // the int type parameter is inferred
Listing 4.7: An example of the inference of type parameters to a generic method.
78 Chapter 4. Application to an Existing Language
4.3.1.3 Context Constraints
Reasoning about parallelism requires reasoning about the disjointness of side-effects. In
my proposed effect system which uses ownership types, reasoning about the disjointness
of effects requires some basis for reasoning about the relationships between two contexts.
As was discussed in Section 3.4.1, there may be times when a designer of a class or
method knowns that a specific relationship should hold between two or more context
parameters. Capturing this information provides more information which can be used
to help determine the relationships between context parameters.
In Zal, definitions which are parameterized by formal context parameters may have one
or more constraint clauses describing the relationships amongst the declared context
parameters and between the declared context parameters and any other contexts in
scope. C] uses where clauses to put constraints on generic type parameters and for
consistency, I have adopted a similar notation for context parameters.
There are four context relationships, shown in Table 4.1, which can be stipulated be-
tween two context parameters in a constraints clause. These for context relationship
constraints are based on the relationships between context which, when established,
are sufficient to facilitate the safe exploitation of inherent parallelism as was discussed
in Sections 3.4 and 3.5. The nested hierarchy of contexts in the ownership system used
by Zal results in contexts being either nested inside one another or disjoint from one
another. The relationships in Table 4.1 allow these relationships to be asserted by the
programmer.
Relationship Descriptiondominates (>) The context on the right is part of the representation of
the context on the left.dominated (<) The context on the left is part of the representation of
the context on the right.dominated or equals (<=) The context on the left is either part of the representa-
tion of the context on the right or is the same contextas that on the right.
disjoint (#) The two contexts are in different branches of the own-ership tree.
Table 4.1: The four context relationships which can be stipulated in a Zal contextconstraint clause and their meanings.
When supplying context constraints on a definition with formal context parameters,
Chapter 4. Application to an Existing Language 79
the constraints appear at the end of the definition immediately before the opening {
of the definition body or the terminal ; just like a generic type parameter constraint
in C]. Each constraint relationship is listed in its own where clause as is done with
generic type parameters. Listing 4.8 shows an example of this constraint syntax on a
class definition.
public class ConstraintExample[owner, a, b, c] where a # b where c <= a {...
}
Listing 4.8: An example of a class annotated with context parameters and contextconstraints using Zal’s syntax.
Note that when dealing with nested definitions, like a method inside a class, at least
one of the two contexts in a constraint must be in the list of formal context parameters
declared on the definition of the method. For example, a method cannot specify a
constraint between two of its containing class’s context parameters.
When actual parameters are supplied for formal context parameters with stipulated
constraints, the compiler tries to statically verify that the constraint is satisfied. If the
compiler cannot determine that the constraint is satisfied, the program is considered
invalid and will not compile.
4.3.1.4 Method Effect Declarations
In Section 3.3, I discussed the need to declare effects on method definitions. The main
reasons for this was to facilitate reasoning about method side-effects in the presence of
overriding. By having overriding and overridden methods declare their effects they can
be checked easily to make sure that the overriding method’s effects are a subset of the
side-effects of the method being overridden. The method body of a method with a set
of declared effects must not exceed the effect declaration, that is the method body’s
overall effects must be a subset of those declared.
In C], unlike Java, methods are not virtual by default. Only methods which are explic-
itly marked with the abstract, override, or virtual keywords in their declaration
can be overridden. This means that methods which are not defined to be abstract,
virtual, or override do not need to declare their effects; the effects can simply be
80 Chapter 4. Application to an Existing Language
computed from the method body. Effects may still be declared on methods when not
required and, where supplied, the effect of the method body will be validated against
the declared effects. By not requiring effect declarations on all methods, the number of
annotations which need to be added to a program is reduced which makes the system
easier to use. To ensure effects can still be calculated even when the method source is
not available, for example when compiling against a Dynamic-Link Library (DLL), the
effects of method are stored as part of the method’s signature after compilation regard-
less of whether they were explicitly declared or not. Allowing effect declarations to be
selectively omitted serves to reduce the programmer’s annotation burden. However, to
facilitate parallelization in the presence of code reuse, the use of effect declarations on
public Application Programmer Interfaces (APIs) would be necessary and code analysis
tools could be used to help enforce this.
Data dependencies exist only when there is an update to shared mutable state; over-
lapping access to unmodified shared mutable state is safe and does not result in a data
dependency. I have, therefore, decided to list the sets of contexts read and written sepa-
rately to reduce the number of false-positive possible data dependencies being detected
by the system. If I chose to amalgamate the read and write effects into a single effect
declaration then overlapping reads would be detected as possible data dependencies.
The effect sets are listed on the method definition after the formal call parameters and
before any context constraints. An example of the syntax is shown in Listing 4.9.
public class MethodExample[owner] {public virtual void operation[a, b, c]() reads <this,a,b> writes <c>
where a # b {...
}}
Listing 4.9: An example of an instance method annotated with effect declarations andcontext constraint clauses.
Default Effects
If a method invoked from a program written in Zal has no effect information, for
example a method written in C] then the compiler assumes that the method reads and
writes the world context — anything on the heap. This is the safest assumption as it
ensures no dependencies are possibly violated, but hinders parallelization.
Chapter 4. Application to an Existing Language 81
4.3.2 Subroutine Constructs
The discussion of how to realize my effect system in Chapter 3 discussed instance
methods only on objects. C], like many other languages, has a number of additional
syntactic constructs to define subroutines. Like methods, these mechanisms need to be
annotated with effect information when they are declared to be overriding or virtual.
These effect annotations can be used to ensure effect consistency in the presence of
overriding. The following subsections discuss C]’s subroutine abstractions and their
annotation with ownership and effect information.
4.3.2.1 Properties
C] properties are syntactic sugar designed to simplify the task of writing and using
simple object getter and setter methods. The property is used like a field and the
compiler handles converting the code to call the property’s get or set method based
on how the property is used. Listing 4.10 is an example of an Employee class with a
firstName field that is made readable and writeable through the FirstName property.
public class Employee{
private string firstName;public string FirstName{
get {return firstName;
}set {
if (value == "")throw new InvalidOperationException();
firstName = value;}
}}
Listing 4.10: An example of a property being used to read and write a field.
Listing 4.11 shows how the property in Listing 4.10 would be implemented using meth-
ods.
82 Chapter 4. Application to an Existing Language
public class Employee{
private string firstName;
public string getFirstName() {return firstName;
}
public string setFirstName(string value) {if (value == "")
throw new InvaldOperationException();firstName = value;
}}
Listing 4.11: Code snippet which shows how the properties in Listing 4.10 could beimplemented using methods.
While properties are syntactic sugar for accessor and mutator methods, the C] lan-
guage specification does not allow properties to have either generic type parameters or
call parameters other than the implied value parameter for the setter. Like methods,
properties can be overridden when the original definition was marked virtual.
As was discussed in Section 4.3.1.4, programmers are required to declare the read and
write effects of overriding and virtual methods to facilitate effect consistency checking
during overriding. I made the design decision to add ownership and effect information to
properties in a manner consistent with the existing C] design decisions. To be consistent
with the lack of generic type parameters on properties, I decided not to allow properties
to be parameterized with formal context parameters. I also decided that the effects of
reading or writing a property must be declared if the property is declared to be abstract,
override, or virtual. This ensures that the effects of the accessors can be kept consistent
during overriding. Effect consistency during overriding means that the effects of the
overriding get accessor do not exceed the declared effects of the overridden get accessor
and the same for the set accessor. In non-virtual properties, the effect declarations
are optional as the effects can be computed from the accessor bodies and the property
cannot be overridden. In the Zal syntax, these declarations take the form of read and
write effect sets attached to the property’s get and set accessors, where present.
Listing 4.12 shows the syntax chosen for accessor effect declarations. When effect
declarations are supplied, it is necessary to ensure that the accessor body effects are
consistent with their declared effects and the effects of any accessors overridden just as
Chapter 4. Application to an Existing Language 83
it is with methods as was previously discussed.
public class Employee[owner]{
private string firstName;
public string FirstName{
get reads <this> writes <> {return firstName;
}set reads <> writes <this> {
if (value == "")throw new InvalidOperationException();
firstName = value;}
}}
Listing 4.12: An example of a property annotated with read and write effects.
Automatic Properties
To further reduce the amount of code needed to write simple properties which just
expose an underlying field, C] version 3.0 introduced automatic properties [78]. With
an automatic property, the field the property exposes becomes implicit rather than
explicit as shown in Listing 4.13.
public class Employer{
public string FirstName{
get;set;
}}
Listing 4.13: An example of an automatic property which does not require an explicitfield or accessor implementations.
The implementation details of automatic properties are well known and defined in the
language specification and so effect declarations are not required for automatic prop-
erties, even if they are declared to be abstract, override, or virtual. All automatic
properties have the following read and write effects depending on whether they are
instance properties or static properties:
84 Chapter 4. Application to an Existing Language
• get - instance: reads this and writes nothing
static: reads the current class’s static context (see Section 4.4) and
writes nothing
• set - instance: reads nothing and write this
static: reads nothing and writes the current class’s static context
(see Section 4.4)
Omitting the effect declarations on automatic properties maintains the simplicity of
the syntax (one of the reasons it was added to the language) without losing any effect
precision; the effects can be inferred from the lack of an accessor implementation. It
is important to note that even though the automatic properties do not carry effect
declarations, their effects must still be consistent in the presence of overriding.
4.3.2.2 Indexers
Indexers are another piece of C] syntactic sugar which are designed to allow access to
an object’s state using an array index style syntax with any number and type of indices.
Indexers are written in a very similar style to properties using get and set accessors.
Listing 4.14 shows an example of an indexer. This indexer takes a day and a calendar
and uses that to convert the day into a number.
public class Calendar {public int firstDayOfWeek = 0;
}
public class Week {public string[] days = {"Sunday", "Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday" };
public int this[string day, Calendar cal] {get {
int dayIndex =Array.LastIndexOf(days, day) - cal.firstDayOfWeek;
return dayIndex < 0 ? dayIndex + 7 : dayIndex;}
}}
Listing 4.14: An example of an indexer used to convert day names into numerical daysof the week from a defined starting point.
As was the case with properties, the accessors in abstract, override, or virtual indexers
Chapter 4. Application to an Existing Language 85
are required to have effect declarations to ensure consistency during overriding. Acces-
sors in non-virtual indexers may declare their side-effects, but the effect declaration is
not required as the effects can be inferred from the accessor’s implementation.
Like properties, indexers cannot be parameterized with generic type parameters. In
deciding whether to allow indexers to be parameterized with context parameters, I
have again chosen to remain consistent with the existing C] language. As a result,
Zal does not allow indexers to be parameterized with context parameters. Listing 4.15
shows the syntax for both declaring and using an indexer with context parameters.
public class Calendar[owner] {public int firstDayOfWeek = 0;
}
public class Week[owner] {public string[]|this| days = {"Sunday", "Monday", "Tuesday",
"Wednesday", "Thursday", "Friday", "Saturday" };
public int this[string day, Calendar|owner| cal]reads <owner> writes <> {get {
int dayIndex =Array.LastIndexOf(days, day) - cal.firstDayOfWeek;
return dayIndex < 0 ? dayIndex + 7 : dayIndex;}
}}
Listing 4.15: An example of how to annotate an indexer with context parameters andeffects. The classes annotated were previously shown in Listing 4.14.
4.3.2.3 Delegates
In C], delegate objects represent methods; they are essentially a type-safe function
pointer. C] allows delegates to be returned from methods, passed as parameters, and
stored on the stack or in the heap as with any other data, thus making methods first-
class citizens of the language. When wrapping a method in a delegate and passing it
around, it is necessary to carry the call parameter and return type information as part
of the delegate’s type. Therefore, a delegate is, essentially, a typed function pointer.
Listing 4.16 shows the C] syntax for declaring a delegate type called BinaryOp taking
two integers and returning an integer.
When a method is assigned to a delegate, C] ensures that the method’s types are
86 Chapter 4. Application to an Existing Language
public delegate Object BinaryOp(Object objA, Object objB);
Listing 4.16: The syntax of a delegate taking two Objects and returning an Object.
consistent with the delegate’s types. The method’s return type is allowed to vary
covariantly and the method’s parameter types are allowed to vary contravariantly. So
in the case of the BinaryOp delegate shown in Listing 4.16 the method assigned to the
delegate would have to accept two Object parameters, but could return any type since
all types are subtypes of object. Trying to assign a delegate that accepted only int
parameters would fail. This ensures type correctness is preserved when delegates are
used.
Delegate type declarations, as with all other type declarations in the C] language,
can have type parameters and associated constraints on those parameters. When a
delegate type is named, the compiler must verify that the type parameters, if any, are
supplied and that the actual types supplied satisfy the type constraints on the delegate
declaration.
With the addition of effects to method signatures (see Section 3.3), it becomes necessary
to carry read and write effects with the delegate’s type to facilitate dependency analysis.
The context parameters for the parameter and return types of the delegate may not be
known when the delegate is declared, and so it is necessary to add context parameters
to the declaration just as for methods. Listing 4.17 shows a version of the BinaryOp
delegate originally shown in Listing 4.16 annotated with context parameters and side-
effects. Delegate declarations may also have context constraint clauses which must be
satisfied when an instance of the delegate type is named.
public delegate Object|C| BinaryOp[A,B,C](Object|A| objA, Object|B| objB)reads <A,B> writes <>
Listing 4.17: An example of the use of context parameters and effect declarations onthe delegate originally shown in Listing 4.16.
When a delegate type is named, actual contexts must be supplied for the formal context
parameters, if any. Further, the supplied context parameters must satisfy any declared
context constraints on the delegate otherwise the delegate type is invalid.
When a method is assigned to a delegate the input parameter types and return types
Chapter 4. Application to an Existing Language 87
must be checked for compatibility. Just as with generic type parameters, context pa-
rameters must be equal for the types to be compatible as was discussed in Section 3.2.3.
While the types themselves are allowed to vary as previously discussed, the context pa-
rameters are not, as allowing this would open a hole in the type system. Further,
the effect declarations of the delegate and the method being assigned to it need to be
checked to ensure that the effects of the method’s declared effects are a subset of the
effects of the delegate. In this way, the delegate effects can be taken as the maximum
effects of any method assigned to it and it is not necessary to resolve which specific
method the delegate may invoke.
Finally, the reading and writing of variables of delegate type is treated like the read
or write of any other variable and produce the appropriate side-effects as discussed in
Section 3.3. Invoking a delegate stored in a variable, causes a read of the variable in
addition to the read and write effects of the delegate itself.
4.3.2.4 Events
In software systems, there is frequently a need for parts of the application to be notified
of changes in the state of the system caused either internally or externally. For example,
in a program with a GUI, one or more pieces of code may want to be notified when a
particular button has been clicked or a particular option selected. The changes in the
state of the system are called events and many software patterns make use of them.
C] provides syntactic support for events. In C] an event is a list of delegates. An
event can be fired using method invocation-style syntax. When a delegate is invoked,
all of the delegates in the event list are invoked sequentially with the supplied actual
parameters. Delegates can be added and removed from the list of observers. Once a
change in state occurs, all of the methods stored in the event list can be invoked from a
single line. An example of an event is shown in Listing 4.18; the invoke of listeners will
invoke all of the methods which have been stored in the listeners list of delegates.
Events cannot be aliased and so no direct changes to the event declaration syntax itself
are required. The changes to delegate types, namely the addition of context parameters
and effect declarations are indirectly consumed by the event construct. Event fields,
88 Chapter 4. Application to an Existing Language
public class EventSource {public delegate void EventHandler(EventSource source);public event EventHandler listeners;
public void fireEvent() {listeners(this);
}}
public class Subscriber {public Subscriber(EventSource target) {
target.listeners += eventListener;}
public void eventListener(EventSource source) {... }}
Listing 4.18: An example of a simple C] event using the delegate type EventHandler.
like other fields, may be either static or instance fields. Invoking the methods stored in
an instance event field causes a read of the containing object’s this context. Similarly,
invoking the methods stored in a static event field causes a read of the containing
class’s static context. Adding to or removing methods from an event causes a write of
the containing object’s this context for instance fields and the containing class’s static
context for static events. Listing 4.19 shows the same event example previously shown
in Listing 4.18, but this time annotated with context parameters and effect declarations.
public class EventSource[owner,observer] {public delegate void EventHandler(EventSource|owner| source)
reads <this,observer> writes <observer>;
public event EventHandler listeners;
public void fireEvent() reads <this,observer> writes <observer> {listeners(this);
}}
public class Subscriber[owner,tgtOwner] {public Subscriber(EventSource|tgtOwner, this| target)
reads <tgtOwner> writes <tgtOwner> {target.listeners += eventListener;
}
public void eventListener(EventSource|tgtOwner, this| source)reads <tgtOwner,this> writes <this> { ... }
}
Listing 4.19: An example of a simple C] event previously shown in Listing 4.18 nowannotated with context parameters and effect declarations.
Chapter 4. Application to an Existing Language 89
4.3.2.5 Anonymous Methods
Anonymous methods were introduced in C] version 2 and are simply methods without
a name. The method is declared inline with the code and is assigned to a delegate for
use elsewhere in the program. Common uses for anonymous methods in C] include the
installation of event handlers.
Anonymous methods operate in a similar manner to normal methods. If the anonymous
method needs to have access to input parameters then these must be declared as part of
the anonymous method’s signature and these must be compatible with the parameters
of the delegate the method is being assigned to. The return type of the anonymous
method is inferred from the delegate the anonymous method is being assigned to and
the method body validated against that inferred type. This is to say that the body of
the anonymous method is validated against the declared signature of the method which
is a combination of explicit parameters and an inferred return type. Listing 4.20 shows
an example of two anonymous methods which are assigned to the add and sub fields of
the simple calculator.
public class RPNCalc {public delegate int BinaryOp(int a, int b);public static readonly BinaryOp add =
delegate(int a, int b) {return a + b;
};public static readonly BinaryOp sub =
delegate(int a, int b) {return a - b;
};
Stack<int> values = new Stack<int>();
public void Push(int value) {values.Push(value);
}
public void ApplyOp(BinaryOp op) {value.Push(op(values.Pop(),values.Pop()));
}}
Listing 4.20: An example of a simple Reverse Polish Notation calculator which definesbinary options to be applied to the stack as a delegate. The calculator supplies twooperations, add and sub, via anonymous method declarations.
The syntax and semantics of the anonymous method declaration operate much like a
90 Chapter 4. Application to an Existing Language
method declaration with the supplied method body being check for correctness against
the signature. Anonymous methods cannot be overridden; they cannot be named. As
is the case with non-virtual methods, it is not necessary to declare the method’s side-
effects as part of the method signature — the effects can be computed by the compiler.
If an effect declaration is made, then the declared effects are checked against the actual
side-effects of the anonymous method’s body — the body effects must be a subset
of those declared. Delegate type compatibility is performed as previously discussed
when the anonymous method is assigned to a delegate. As with standard methods, the
anonymous method may have context parameters. The previous calculator example is
repeated with the addition of context parameters and effect declarations in Listing 4.21.
public class RPNCalc[owner] {public delegate int BinaryOp(int a, int b) reads <> writes <>;public static readonly BinaryOp add =
delegate(int a, int b) reads <> writes <> {return a + b;
};public static readonly BinaryOp sub =
delegate(int a, int b) reads <> writes <> {return a - b;
};
Stack<int>|this| values = new Stack<int>|this|();
public void Push(int value) reads <this> writes <this> {values.Push(value);
}
public void ApplyOp(BinaryOp op) reads <this> writes <this> {value.Push(op(values.Pop(),values.Pop()));
}}
Listing 4.21: An example of an ownership annotated Reverse Polish Notation calcu-lator class based on the original C] example shown in Listing 4.20. Note the effectdeclarations added to the anonymous methods.
Outer Variables
Anonymous method bodies are allowed to refer to local variables and fields defined
outside the scope of the anonymous method’s body. The local variables from the code
surrounding an anonymous method which are used in the anonymous method are called
outer variables. C] implements full lexical closures, but changes to the local variables
made by the anonymous method can be observed by the surrounding code and vice
Chapter 4. Application to an Existing Language 91
versa. Listing 4.22 shows an example of an anonymous method returned from the
operation method which captures the outer variables i and j.
public class OuterVarEx {public delegate int Op();public Op operation() {
int i = 20;{
int j = 30;return delegate { i + j; };
}}
}
Listing 4.22: A code listing showing the capture of local variables from two differentscopes in the anonymous method returned from the operation method.
The C] compiler implements outer variables by extracting them into private inner
classes based on their declaration scope. Anonymous methods are turned into methods
on these private inner classes, again based on the scope of the anonymous method’s
declaration. Listing 4.23 shows how the C] compiler would implement the outer vari-
ables and anonymous method shown in Listing 4.22. Note that the compiler would use
a different naming scheme, but the example serves to demonstrate the transformation
employed.
public class OuterVarEx {public delegate int Op();private class OperationScope1 {
public int i;}private class OperationScope2 {
public OperationScope1 outerScope;public int j;public int AnonMethod1() {
return outerScope.i + j;}
}public Op operation() {
OperationScope1 scope1 = new OperationScope1();scope1.i = 20;{
OperationScope2 scope2 = new OperationScope2();scope2.outerScope = scope1;scope2.j = 30;return new Op(scope2.AnonMethod1);
}}
}
Listing 4.23: An example showing how the C] compiler would implement the exampleshown in Listing 4.22 using private inner classes.
92 Chapter 4. Application to an Existing Language
This transformation of outer variables into fields of an inner class means that a read
of a local variable which would have originally incurred only a local stack effect must
now cause a heap effect as well. This means that a data flow analysis needs to be
performed to identify which local variables are actually outer variables. Once outer
variables have been identified, a read of that local variable must generate a read of the
this and assignment to the local variable must generate a write of the this regardless
of whether that read or write is in the anonymous method or not. This ensures that
even if the anonymous method is invoked from another part of the program, the effect
on the captured local variables will be contained in the effect sets and the appropriate
data dependencies generated. Listing 4.24 shows how the implementation shown in
Listing 4.23 would be annotated with context parameters and effect declarations. List-
ing 4.25 shows how the original anonymous method would be annotated with context
parameters and effect declarations.
public class OuterVarEx[owners] {public delegate int Op() reads <this> writes <>;private class OperationScope1[owner] {
public int i;}private class OperationScope2[owner] {
public OperationScope1|owner| outerScope;public int j;public int AnonMethod1() reads <owner> writes <> {
return outerScope.i + j;}
}public Op operation() reads <this> writes <> {
OperationScope1|this| scope1 = new OperationScope1|this|();scope1.i = 20;{
OperationScope2|this| scope2 = new OperationScope2|this|();scope2.outerScope = scope1;scope2.j = 30;return new Op(scope2.AnonMethod1);
}}
}
Listing 4.24: An example showing how the implementation of the example shown inListing 4.22 would be annotated with context parameters and effect declarations.
Chapter 4. Application to an Existing Language 93
public class OuterVarEx[owner] {public delegate int Op();public Op operation() reads <this> writes <> {
int i = 20;{
int j = 30;return delegate { i + j; };
}}
}
Listing 4.25: The outer variable capture example from Listing 4.24 annotated withcontext parameters and effect declarations.
4.3.2.6 Lambda Expressions
Lambda expressions are similar to anonymous methods in that they are another means
of declaring anonymous inline executable code blocks. They are used in the same way
as anonymous methods, but differ significantly in the semantics of how they are checked
for correctness by the compiler.
The bodies of anonymous methods are checked by using the declared parameter infor-
mation. That is, the method body is validated before the assignment of the method
to the delegate is validated. Lambda expressions, on the other hand, do not require
parameter types to be specified, the types of the parameters are inferred based on
the delegate the method is assigned to. Only after the type information for the sig-
nature is computed can the lambda expression’s implementation be validated. The
flow of type information is reversed in lambda expressions compared with anonymous
methods. Listing 4.26 shows an invalid anonymous method and an invalid lambda ex-
pression. The anonymous method fails to compile because the int parameter cannot
be converted to a short. The lambda fails to compile because it is bound to delegate
which accepts an int parameter, it would compile if the delegate accepted a short.
This change to the order in which lambda expressions are type checked relative to
other subroutine definitions has a significant impact on how the context parameters
and effect declarations are added to lambda expressions. The details of the methods a
lambda expression invokes may not be computable until the expression is bound to a
delegate type and the types of the lambda expression’s parameters become known. Be-
cause of this, it is incorrect to ask the programer to declare the side-effects of a lambda
94 Chapter 4. Application to an Existing Language
public class AnonVsDelegate {public bool IsZero(short value) {return value == 0; }
public delegate string NumToStr(int value);
public NumToStr anonMethod =delegate(short i) { return IsZero(i) ? "Empty" : i.ToString(); }
public NumToStr lambda =i => IsZero(i) ? "Empty" : i.ToString();
}
Listing 4.26: An example code snippet showing the difference in the typing of anony-mous methods and lambda expressions. The anonymous method fails to compile be-cause the int i parameter cannot be implicitly converted to a short while the lambdafails to compile because of the delegate it is bound to taking an int parameter; it wouldsucceed if the delegate took a short as a parameter.
expression. The effects can, however, be computed as part of the lambda’s implemen-
tation validation. This effect information can then be checked for consistency against
the declared effects of the delegate. Further, because the programmer does not have
to specify the types of the delegate’s parameters it is incorrect to ask the programmer
to list a set of context parameters for the method. Context information is received
from the delegate the lambda expression is being bound to. This means that there
are no syntactic changes required for lambda expressions, but the compiler needs to do
more work to carry context parameter information into the validation of the lambda’s
implementation. It must also compute the effect of the lambda expression after the
body has been validated and ensure the effects are a subset of those declared by the
delegate. Listing 4.27 shows an example of a calculator used previously to demonstrate
the annotation of anonymous methods. The lambda expression syntax is not modified
because the allowable effects are obtained from the delegate to which the lambda is
assigned to maintain consistency with the semantics of the lambda expressions.
It is important to note that because lambda expressions are declared inline and passed
through the program using delegates, there is no way for them to be overridden. This
means that the lack of an effect declaration does not impact on the correctness of effects
in the presence of overriding as was the case with anonymous methods.
Lambda expressions, as with anonymous methods, may capture local variables, thus
turning them into outer variables. The compiler uses the same techniques to implement
these outer variables and so the same effect computation procedures as were outlined
for outer variables associated with anonymous methods must be followed.
Chapter 4. Application to an Existing Language 95
public class RPNCalc[owner] {public delegate int BinaryOp(int a, int b) reads <> writes <>;public static readonly BinaryOp add = (a, b) => a + b;public static readonly BinaryOp sub = (a, b) => a - b;
Stack<int>|this| values = new Stack<int>|this|();
public void Push(int value) reads <this> writes <this> {values.Push(value);
}
public void ApplyOp(BinaryOp op) reads <this> writes <this> {value.Push(op(values.Pop(),values.Pop()));
}}
Listing 4.27: An example of an ownership annotated Reverse Polish Notation calculatorclass based on the original C] example shown in Listing 4.20 with the anonymousmethods replaced by lambda expressions.
4.3.2.7 Extension Methods
C] version 3.0 introduced extension methods as a mechanism to inject methods on to
existing types without having to modify or extend the original type definition. Exten-
sion methods are implemented as static methods, but appear to be instance methods
on the type of the extension parameter. Listing 4.28 shows an extension method dec-
laration. The first parameter of the method is preceded by the this keyword which
denotes the static method as an extension method on C]’s builtin string type which
is an alias for the System.String type in the .NET Base Class Library.
public static class StringExtensions {public static int WordCount(this string str) {
string[] words = str.Split(’ ’);return words.Length;
}}
Listing 4.28: An example of an extension method which adds a WordCount method tothe interface of string.
The side-effects for an extension method could be computed like those for any other
static method. However, this would not fully exploit the benefits of having the extension
method appearing to be an instance method on the type of the extension parameter.
Instead, it would be more consistent to analyze the extension method as if it were a
method on the type of the extension parameter. This means that reads or writes of
the extension parameter’s representation would, therefore, generate a read or write of
96 Chapter 4. Application to an Existing Language
this. Listing 4.29 shows the extension method shown in Listing 4.28 with the addition
of context parameters and effect declarations. Note how reads of the str variable cause
reads of the this context.
public static class StringExtensions {public static int WordCount(this string str) reads <this> writes <>
reads <this> writes <> {string[]|this| words = str.Split|this|(’ ’);return words.Length;
}}
Listing 4.29: An example of an extension method annotated with context parametersand effect declarations showing reads of the extension parameter generating reads ofthis.
4.3.3 Types
In Chapter 3, the discussion of how to apply context parameters to data types was
limited to standard object-oriented classes used to produce heap allocated objects.
The C] programming language provides a number of other data types, some with
quite different semantics from classes, which also need to be annotated with context
parameters. In this section I discuss the application of Ownership Types to these types.
4.3.3.1 Ref and Out Call Parameters
Java employs a strict pass-by-value semantics for method actual parameters. C] em-
ploys the same pass-by-value semantics, but also supports two additional parameter
passing modes denoted on parameters using ref and out. A ref parameter is passed
by reference, thus allowing the original source of the value to be updated by the method.
An out parameter is the same as a ref parameter except that it is allowed to be unini-
tialized and so the parameter must be assigned a value in the method body before it
is used. Expressions used to compute actual ref and out parameter values must be
assignable expressions. Obviously, these different passing conventions have an impact
on the effect computation strategy outlined in Section 3.3.
Fortunately, the ref and out keywords identify which parameters may be assigned to by
a method. The use of one of these keywords identifies that the parameter is passed by
reference, rather than by value as is usually the case. Passing by reference means that
Chapter 4. Application to an Existing Language 97
the caller can “see” any changes made by the method to the variables supplied. These
keywords document the read and write effects possible with these special parameters
and so these special parameters do not have to be included in the method’s effect
declaration.
When computing the effect of invoking a method with ref or out parameters, the
effect of evaluating the method’s actual call parameters needs to be modified. The
expressions passed by reference may be both read and written by the method. Rather
than perform a complex analysis to determine if the parameter passed by reference is
read, written, or both, my effect system conservatively assumes that it is both read and
written. That means that any expression supplied as an actual ref or out parameter
may be used as both an l-value and an r-value. To compute the side-effects of such an
actual parameter, the effect of reading the expression and assigning to the expression
need to be computed and the union of these two sets of effects is the total effect of
evaluating that parameter. This accounts for the allowed effects involving these special
parameters without having to add additional effects to the method signature since the
parameters themselves document the side-effects.
4.3.3.2 Partial Types
C], unlike Java, allows a class definition to be split into multiple parts in different
files. When generic type parameters are present on a partial type, all of the partial
type’s declarations must have the same number of generic type parameters in the same
order with the same names [78]. To maintain consistency, when a partial type is
parameterized by context parameters, all of the partial implementations of the type
must have the same context parameters with the same names in the same order. This
is necessary to ensure that the different parts are all correctly associated with one
another during compilation.
4.3.3.3 Interfaces
Interfaces are collections of method signatures, without implementations, which a type
may promise to implement. Interfaces are not instantiated independently and do not
have their own representation. Interfaces do not, therefore, have their own this context;
98 Chapter 4. Application to an Existing Language
they only have context parameters which may be used to help specify the read and write
effects of the methods on the interface. As a result, the special meaning of the first
formal context parameter being the owner is dropped for interfaces because they do
not have any representation of their own and they do not form part of any object’s
representation. An object which implements an interface has an owner and a this
context like any other object in Zal.
The methods in an interface can have many different implementations, each type which
implements the interface supplies its own method implementations. To ensure consis-
tent effects between all of the implementations, it is necessary to include effect decla-
rations on all of the method signatures listed in an interface and ensure these effects
are consistent with the method implementations when supplied.
4.3.3.4 Arrays
In C], an array is special object that can be used to hold a sequence of either references
to reference types or copies of value types. Because array memory is heap allocated and
arrays are passed by reference in C] they, like any other object, need to have an owner
which denotes the region of memory in which the array is located. It is important to
note that the owner of the array is distinct from the owner of the elements of the array,
if the elements are reference types; the array may be located in a different context from
the objects it references.
C] also provides support for jagged and multi-dimensional arrays. Jagged arrays are
simply arrays of arrays and so each dimension of the jagged array needs to have an
owning context supplied. Multi-dimensional arrays are single objects which have two or
more dimensions required to access an array element. These multi-dimensional arrays
have a single owner since they represent a single object in memory.
Listing 4.30 shows examples of the different types of array discussed and shows those
same examples annotated with context parameters. Note that the context parameters
for the element’s type appear immediate after the type’s name while the array’s owner
is situated directly to the right of the []s.
Chapter 4. Application to an Existing Language 99
// single dimension arrays of value types
int[] values = new int[5];int[]|owner| values = new int[5]|owner|;
// single dimension arrays of reference types
Object[] values = new Object[5];Object|objOwner|[]|aryOwner| = new Object|objOwner|[5]|aryOwner|;
// jagged arrays of reference types
Object[][][] values = new Object[5][][];Object|objOwner|[]|dim1Owner|[]|dim2Owner|[]|dim3Owner| values =
new Object|objOwner|[5]|dim1Owner|[]|dim2Owner|[]|dim3Owner|;
// multi-dimensional arrays
Object[,,] values = new Object[5,5,5];Object|objOwner|[,,]|aryOwner| = new Object|objOwner|[5,5,5]|aryOwner|;
Listing 4.30: Examples of single, jagged, and multi-dimensional arrays and their anno-tation with context parameters.
4.3.3.5 User Defined Value Types
In Java, all user-defined types are reference types: the object is allocated on the heap
and a reference is used to access the object. This means that only the reference to the
object is copied on assignment, not the object itself. As was discussed in Section 3.3.2,
I have chosen to handle stack and heap effects orthogonally since the aliasing of stack
locations is strictly controlled, unlike heap locations. This orthogonal handling was
chosen to simplify the ownership and effect systems. The handling of user defined
value types I present in this section is a direct result of this orthogonal handling.
C] provides user-defined reference types in the form of classes as Java does. Unlike Java,
C] also allows user-defined value types in the form of structs. When user-defined value
types are assigned, an entire copy of the struct instance is made. An example of this
copy on assignment behavior is shown in Listing 4.31. The restricted stack model of the
C] programming language means that references to stack allocated objects are tightly
controlled. They can be referenced only when passed as a ref or out parameter to a
method (see Section 4.3.3.1) or when captured as an outer variable by an anonymous
method or lambda expression (see Section 4.3.2.5).
The aliasing of user-defined value types is heavily restricted due to their copy-on-
assignment semantics. This means that a struct is either stored on the stack or on the
100 Chapter 4. Application to an Existing Language
public struct Employee {public string name;
}...Employee employee1 = new Employee(); // writes employee1
employee1.name = "Bob"; // writes employee1
Employee employee2 = employee1; // reads employee1 and writes employee2
employee2.name = "Fred"; // writes employee2; employee1.name is still "Bob"
Listing 4.31: A code fragment showing the copy on assignment behavior of a user-defined value type.
heap as part of some object’s representation. The key point to note is that a user-
defined value type is not allocated its own chunk of memory. Heap allocated objects
which store structs allocate the memory as part of the object’s implementation. Stack
variables which hold objects reserve a piece of stack memory of sufficient size for the
struct. This means that the user-defined value type does not have its “own” memory.
The net result of this is that a struct does not need to have an owning context. A struct
may have fields which hold references to non-value types which means that value-types
may still need context parameters. A struct may, therefore, be parameterized by any
number of context parameters to facilitate the construction of types within the struct,
but the special owner meaning attached to the first context parameter in classes does
not apply to structs.
Consider the code in Listing 4.32 which shows an example of a struct representing a
point in a two dimensional coordinate system. Note that the struct holds a reference to
a heap allocated object. Following a discussion of side-effects in structs, this example
will be annotated with context parameters and effect declarations in Listing 4.33.
The declared side-effects of methods and other similar executable code blocks can read
and write the fields of the user-defined value types. The reads and writes of these
fields can be described as reads and writes of the struct’s representation using the this
context. The difference between the this effects on the methods of structs compared to
those on classes is in their abstraction when the this context cannot be named directly.
When the this context is abstracted, it is abstracted to a container. In the case of a
class, the this is abstracted to another context parameter representing a heap memory
region containing the object’s representation. In the case of a struct, the container may
be a heap memory region in the form of a context (for a struct copy stored inside a heap
Chapter 4. Application to an Existing Language 101
public class CoordinateSystem {...
}
public struct Point {public int x;public int y;public CoordinateSystem system;
public Point(int x, int y, CoordinateSystem system) {this.x = x; this.y = y; this.system = system;
}public double GetDistance(Point p) {
return Math.Sqrt(Math.Pow((x - p.x), 2), Math.Pow((y - y.x), 2));}
}
Listing 4.32: An example of a struct in the form of a two dimensional coordinate in acoordinate system. Note the struct holding a reference to the CoordinateSystem class.
allocated object) or a local variable name for a struct stored on the stack. This means
that the this context may be abstracted to a context or a local variable depending on
where the struct is being stored.
It is important to note that the this context cannot be used as a context parameter
when naming a type inside a struct since it does not correspond to a region of the
program heap; it is used only in effect sets to describe reads and write of the struct
itself. Listing 4.33 shows the Point struct shown in Listing 4.32 annotated with con-
text parameters and effect sets. Note that the struct has a context parameter which
represents the owner of CoordinateSystem held by the struct. Also note the effects of
the GetDistance method which reads this, the current struct.
4.3.3.6 Static Classes
C] allows types to be declared with a static modifier. The static modifier means that
the type cannot be instantiated. Non-static classes are usually parameterized by a list
of formal context parameters. The first parameter is, by convention, used to hold the
owner of an instance of the type, ie the owner of an object. Because a static class cannot
be instantiated, it does not need a context parameter for its owner. The static class
may still have context parameters, as do user defined value types (see Section 4.3.3.5),
to allow types to be constructed within the static class. There is no actual change to
102 Chapter 4. Application to an Existing Language
public class CoordinateSystem[owner] {...
}
public struct Point[sysOwner] {public int x;public int y;public CoordinateSystem|sysOwner| system;
public Point(int x, int y, CoordinateSystem|sysOwner| system)reads <> writes <> {this.x = x; this.y = y; this.system = system;
}public double GetDistance(Point p) reads <this> writes <> {
return Math.Sqrt(Math.Pow((x - p.x), 2), Math.Pow((y - y.x), 2));}
}
Listing 4.33: The Point value type shown in Listing 4.32 annotated with contextparameters and effect declarations.
the syntax of the context parameter list, but the special meaning of the first parameter
does not apply to classes declared to be static. It is important to note that because the
type cannot be instantiated there is no this accessible within the type and there are no
sub-context declarations.
4.3.3.7 Nullable Types
Nullable types add null to the range of values that can be represented by a value type.
This means that a value type can have a specific value to represent unassigned. For
example, consider the bool data type: an unassigned bool variable defaults to a value
of false. This initial value of false cannot be distinguished from a value of false
assigned to the variable. A nullable bool (written bool?) can be initialized to null.
The null indicates that a value has not been assigned. The nullable types syntax used
in C] is syntactic sugar which hides the fact that the nullable types are implemented
by the compiler using the System.Nullable<T> struct. Because they are implemented
with structs, and so are value types, no context parameters are required and so there
is no modification of their syntax.
Chapter 4. Application to an Existing Language 103
4.3.3.8 Existing Types
When writing programs in Zal, it is necessary to make use of C] class libraries, such as
the Base Class Library. These libraries contain a large number of classes including a
number of data types, interfaces, and collection data structures regularly used in pro-
grams. Third party code may not always be annotated with owner or effect information.
Because it is highly desirable to continue using these classes despite the lack of declared
effect information, I have included a syntactic construct in my implementation of the
Zal language. This is similar to the wrapper classes introduced in Java 1.5 to facilitate
the use of existing classes [114]. This allows the programmer to provide ownership and
effect information for existing definitions through a syntactic construct, I have called
an effectTemplate. These programmer provided effects are currently treated by the
compiler as assertions and they are not validated. It is possible that this ownership
and effect information could be validated by analyzing the byte-code, but currently the
compiler treats effectTemplate declarations as programmer assertions and does not
validate them.
An effectTemplate must be tied to an existing type definition such as a class, struct,
or interface. The type tied to the template is referred to as the annotated type. The
annotated type is declared on the effectTemplate definition using C]-style inheritance
syntax as shown in Figure 4.34. In the figure, an effectTemplate is being declared
to add a owner context parameter to the existing IList interface in the Base Class
Library.
public effectTemplate IList<T>[owner] : IList<T> { ... }
Listing 4.34: An example of the syntax for declaring an effectTemplate to injectcontext and effect information onto an existing type.
An effectTemplate does not have to declare formal context parameters; if context
parameters are not being added, the declaration syntax is slightly different as shown in
Figure 4.35. The annotated type is the type named after the effectTemplate keyword.
An effectTemplate serves two purposes: (1) it allows types constructed within the
annotated type to be parameterized with context parameters and (2) it allows effect
104 Chapter 4. Application to an Existing Language
public effectTemplate IList<T> { ... }
Listing 4.35: An example of the syntax for declaring an effectTemplate which does notadd formal context parameters to a type, but still injects context and effect informationonto member declarations it contains.
declarations to be added to the constructors, methods, properties, indexers, and dele-
gates declared in the annotated type. The effectTemplate contains no implementation
details; no bodies can be supplied on subroutine declarations and no initializers may be
supplied on fields. Further, all definitions in the template must have equivalent defini-
tions without ownership information in the annotated type. A template does not need
to declare effect and ownership information for all definitions in a type; the program-
mer may annotate only a subset of them. However, when declaring effect information
for two types S and T , where S is a subtype of T (S <: T ), it is necessary to ensure
the effects declared in S’s effectTemplate are a subset of any inherited definitions in
T ’s effectTemplate. This is the same restriction required of normal Zal classes and
ensures that the declared effects of a method are the maximum effects possible invoking
the method or any overriding implementation. At present there is limited support for
this validation in the Zal compiler, but full support could be easily implemented using
existing compiler infrastructure.
As previously discussed, the compiler treats the information contained in
effectTemplates as programmer supplied assertions which must be true. Further,
an effectTemplate which adds context parameters to a type can coexist with the
original type definition and both may be used in a program. It may be possible to
build a verifier or inference system which processes Common Intermediate Language
(CIL) byte-code to automatically create effectTemplates for existing types, but such
work is beyond the scope of this thesis.
Listing 4.36 shows an effectTemplate which annotates some methods of C]’s
ICollection<T> interface.
Chapter 4. Application to an Existing Language 105
effectTemplate ICollection<T>[owner, data] : ICollection<T>where T : Object|data| {void Add(T item) reads <this> writes <this>;void Clear() reads <this> writes <this>;bool Contains(T item) reads <this, data> writes <>;bool Remove(T item) reads <this, data> writes <this>;void CopyTo[arrayOwner](T[]|arrayOwner| array, int arrayIndex)
reads <this, arrayOwner> writes <arrayOwner>;T Item {
get reads <this> writes <>set reads <this> writes <this>
}}
Listing 4.36: An example of an effectTemplate which adds effect declarations to someof the methods of the ICollection<T>.
4.4 Statics
One of the major semantic and syntactic features present in both Java and C] not
discussed in most ownership type systems is statics. Static fields and methods belong to
the type they are declared on and not an instance of the type. The traditional system
of contexts discussed in Chapter 3 does not consider these static language features.
Static methods and fields were considered in one chapter of Potanin’s thesis [95]. I
have taken a different approach to handling statics for consistency with existing C]
semantics. This section discusses how to account for these language features in Zal.
4.4.1 Static Fields
The class a static field belongs to can be statically determined. This static storage
is separate from the representation hierarchy formed by heap allocated objects and
separate from the stack. Each class’s static storage is separate from the static storage
of every other class; there is no hierarchical structure as there is for heap allocated
objects.
The question remains of how to represent reads and writes of static fields in the effect
system. I propose that the static storage of each class be abstracted as a special static
context. Each type in the system has a context, with the same name as the type,
in which its static representation is stored. I call these contexts type contexts. All
type contexts are disjoint from one another and disjoint from all ordinary contexts.
Type contexts are dominated by the special top context, world. An example of a type
106 Chapter 4. Application to an Existing Language
context as an effect is shown in Listing 4.37, where reading the static field value in the
DataStore results in a DataStore type context read.
public class DataStore {public static int value = 1;public int getValue() reads <DataStore> writes <> {
return DataStore.value;}
}
Listing 4.37: An example of a read of a static field of the DataStore class causing aread of the DataStore type context.
The next thing to consider is what happens to static fields when classes are extended.
In C] classes can inherit fields, including static fields, through subtyping. When static
fields are inherited, the child class and the parent class share the same instances of the
inherited static variables. This means that reads and writes of inherited static fields
must be treated as reads and writes of the type context in which they were originally
declared. An example of this is shown in Listing 4.38.
public class Parent {public static int value = 1;
}public class Child : Parent {
public int getValue() reads <Parent> writes <> {return Child.value;
}}
Listing 4.38: The read of Child.value actually reads the value field declared on theParent class and so results in a read of the Parent type context.
Static fields declared on types parameterized by generic type parameters are unique
to the specific instance of the generic type named. For example, a generic class
Test<T> with a static field value, Test<int> would have a different value field from
Test<long>. This behavior is consistent with the use of type names to name type
contexts.
When a static field declared on a type parameterized by generic type parameters is
read or written, the static context named in the appropriate read and write effects
consists of the type name and the actual context parameters used to name the type.
For example, a read of the value field on type Test<int> would cause a read of the
Chapter 4. Application to an Existing Language 107
Test<int> static context. Like ordinary static contexts, the static contexts named for
a generic type are heap effects. The specific static contexts constructed from a generic
type are all disjoint provided the type parameters supplied on the type are not the
same. For example, the context Test<int> is disjoint from the context Test<long>.
Lastly, the behavior of static fields of types with context parameters needs to be con-
sidered. In the case of generic type parameters, a different static field is associated with
each constructed type. Further, generic type parameters can be used in the type of a
field as shown in Listing 4.39.
public class Example<T> {public static T value;
}...Example<int>.value // value is an int
Example<string>.value // value is a string
Listing 4.39: An example of a generic type parameter being used in the declaration ofa static.
When parameterizing a type with context parameters, it may be necessary to use some
of the context parameters to construct the types of static fields just as generic type
parameters may be used to construct the types of static fields. I have chosen to maintain
a behavior consistent with that of generics, i.e., a different static field is associated with
each type constructed with different actual context parameters. This means that for
a type Test[owner] with a static field value, constructed types Test|actual1| and
Test|actual2| would be associated with different value fields. Listing 4.40 shows an
example of a class with a static field whose type is constructed using the class’s formal
context parameter.
public class Example[owner] {public static Object|owner| value;}...Example|actual1|.value // value is owned by actual1
Example|actual2|.value // value is owned by actual2
Listing 4.40: An example of a static field whose type is constructed using class contextparameters.
108 Chapter 4. Application to an Existing Language
4.4.2 Methods
Static methods are the same as instance methods except that the static methods do not
have access to their type’s instance fields and methods. The effect declarations on these
static methods are the same as the effect declarations on instance methods with the
exception that the effect sets cannot contain the this context or declared sub-contexts
since the method is not associated with an instance of the type.
4.5 LINQ
One of the major additions made in version 3.0 of the C] programming language was
the LINQ query sublanguage [78]. LINQ provides a syntax reminiscent of SQL which
allows programmers to manipulate collections and other data sources in a declarative
manner. When a LINQ query is applied to a collection it can enumerate and process
the elements of a collection in much the same way as the foreach loop. An example
of a simple LINQ query is shown in Listing 4.41.
public class Item {public string name;public static List<Item> items;public static void Main() {
IEnumerable<Item> result =from item in itemswhere item.Name.StartsWith("B")select item
}}
Listing 4.41: An example of a simple LINQ query
Under the hood, LINQ expressions are syntactic sugar which are reduced to a series
of chained method invocations on the expression’s data source. The LINQ language
defines a number of methods, called standard query operators, which are used to im-
plement LINQ queries. The LINQ library supplied with C] version 3.0 supplies several
different implementations of the standard query operators including a set for collec-
tions included in the .NET Base Class Library. The LINQ query shown in Listing 4.41
would be implemented using the methods supplied in the LINQ library. The reduction
of LINQ expressions into chained method invocations is achieved through the repeated
Chapter 4. Application to an Existing Language 109
application of a set of ordered rules supplied in the language specification [78]. Com-
puting the side-effects of a LINQ query or parts thereof is, therefore, no different than
computing the effects of any other series of chained method invocations provided that
the side-effects of the standard query operators are known. In my implementation of
Zal, discussed in Chapter 6, I have captured the standard query operator side-effects
to facilitate the analysis of LINQ expressions. Listing 4.42 shows the result of reducing
the LINQ expression shown in Listing 4.41 to a series of chained method invocations.
public class Item {public string name;public static List<Item> items;public static void Main() {
IEnumerable<Item> result =names.Where(n => n.Name.StartsWith("B")).Select(n => n);
}}
Listing 4.42: The reduction of the simple LINQ query example shown in Listing 4.41to a series of chained method invocations.
Declarative operations on collections are amenable to parallelization as was demon-
strated with foreach loops in Section 2.4.1.1. As part of version 4 of the .NET Frame-
work, Microsoft has added a Parallel LINQ (PLINQ) library which can be used to
parallelize LINQ queries [79]. As was the case with foreach loops, not all LINQ
queries can be safely parallelized. Microsoft has not published a complete set of suf-
ficient conditions for the safe parallelization of LINQ queries, but they have provided
the following list of potential pitfalls with PLINQ [81]:
1. Do not assume that parallel is always faster
2. Avoid writing to shared memory locations
3. Avoid over-parallelization
4. Avoid calls to non-thread-safe methods
5. Limit calls to thread-safe methods
6. Avoid unnecessary ordering operations
7. Prefer ForAll to ForEach when it is possible
110 Chapter 4. Application to an Existing Language
8. Be aware of thread affinity issues
9. Do not assume that iterations of ForEach, For and ForAll always execute in
parallel
Employing PLINQ on queries which have any of these pitfalls could result in a program
which may not produce a correct or consistent result.
Detecting if a LINQ expression writes to a shared memory location is easily done using
the effect system I have proposed once the LINQ methods have been annotated with
effect information. The annotation is necessary to ensure that any side-effects of specific
LINQ operations are accounted in the effect system. My system, could, therefore, be
used to help facilitate the exploitation of inherent parallelism in LINQ queries, just as
in foreach loops.
Chapter 5
Formalization
In the preceding chapters I have developed a system for reasoning about parallelism
in modern, imperative object-oriented languages. My proposals have been supported
by reasoned arguments, but I have not formalized and proved these ideas. The goal of
this chapter is to formalize my proposals and sketch proofs to show that my proposed
sufficient conditions for parallelism allow parallelization to take place only when doing
so would not violate any dependencies.
Proving the sufficient conditions for the safe application for parallelism using these type
and effect systems I have proposed requires a number of supporting proofs. The overall
structure of the argument presented in this chapter is shown in Figure 5.1. The arrows
in Figure 5.1 indicate how the proofs compose; the proof at the base of the arrow is
used as part of the proof at the head of the arrow.
Before the proofs of the correctness of the sufficient conditions for parallelism proposed
in Chapter 3 can be sketched a number of language properties need to be established:
1. Subject Reduction — types are preserved by reduction
2. Progress — a well typed program is a value or may reduce
3. Soundness — an expression can only generate a runtime type error if it makes
use of an unsafe down-cast
4. Well Formed Heap — types are preserved on the heap
111
112 Chapter 5. Formalization
Figure 5.1: The structure of the proof of sufficient conditions for parallelism correctnesspresented in Chapter 5. The proofs of the items highlighted in red are cited in theliterature rather than re-derived.
5. Owner Invariance — the owner of an object cannot be changed by casting,
assignment, or expression reduction
6. Cast Safety — if an expression makes use of up-casts, no invalid casts will be
encountered during reduction
7. Context Structure is a Tree — the contexts in a program form a tree rooted
at the world context.
8. Context Parameters do not survive — during reduction, all context param-
eters are replaced with locations
Chapter 5. Formalization 113
Having established that the above properties hold, the next step is to demonstrate that
the following properties hold for the effect system:
1. Effect Soundness — effects are preserved by all effect rules; no effect can be
captured and then lost
2. Effect Completeness — all reads and writes of the stack and heap are captured
as part of the computed effects
3. Disjointness Test Correct — demonstrates the basic pointer chasing algo-
rithm, used by the Zal runtime system for testing context disjointness, is correct
4. Context Disjointness Implies Effect Disjointness — stating that two con-
texts are disjoint implies that the context and all of the contexts below it in the
ownership tree are disjoint
5. Disjoint effects imply no data dependencies — if there are no overlaps
between the write sets of two expressions or the read set of one and the write set
of the other, then there are no data dependencies between the two expressions
Each of the key features at the core of the type and effect system I have proposed can
be found in different ownership systems in the literature. Rather than re-deriving the
proofs for these different core language features and the key properties of Ownership
Types, I cite existing work to argue for the language properties I rely on. The proofs for
the correct and safe operation of the advanced language features described in Chapter 4
can be directly derived from these core language proofs using the syntactic approach
for proving type soundness proposed by Wright and Felleisen [126].
I then use the properties proven for the core language to sketch how to prove that if
data dependencies (flow, output, and anti dependencies) are preserved by a parallelizing
transformation, then the result of the transformation will be sequentially consistent.
By sequentially consistent, I mean that executing the parallel program will produce
the same result and side-effects as executing the original sequential program. Finally,
I sketch proofs for the correctness of the sufficient conditions proposed for the safe use
of task, data, and pipeline parallelism as discussed in Chapter 3.
114 Chapter 5. Formalization
Readers uninterested in the details of these proofs and this more formal presentation
of the system can skip the remainder of this chapter. Readers who are interested in the
key details of these proofs should read on; the presentation generally follows the order
of discussion just presented with a few minor exceptions.
5.1 Type System
Before a program’s type and effect information can be relied on for the purposes of
parallelization, it is first necessary to argue that the language’s type and effect systems
generate complete and accurate information. In this section I cite previous work to
prove different features and facets of my proposed type and effect system. The goal is
to cite previous work to lend credence to the argument that my proposal is valid from a
type theoretic perspective. I will begin by citing existing Ownership Types literature to
argue for the safety of my proposed type system and I will follow this with an argument
for the safety and completeness of the effect system I have proposed.
5.1.1 Type Rules
When arguing for the correctness and safety of the type rules employed by my proposed
system, there are two main areas of concern: the core language (C]) and the ownership
extensions applied to the core language.
Fruja has published a proof of the type safety of the C] type system [42]. This proof
covers the core features of the C] language. Furja’s proof does not directly cover
C] generics. Jula has formalized the semantics of C] generics [65] and combined with
the proofs of generic types proposed for Java, including the Featherweight Generic Java
(FGJ) [60] suggests that the C] system should also be type safe. The proofs contained in
these works support the argument that the C] language suggest the language properties
listed previously hold for the core language I chose to extend.
Ownership Types has been an area of active type systems research for a number of
years [28, 41, 95, 38, 24, 25]. Java has traditionally been used as a basis for ownership
research languages. While the C] language is different from Java in a number of ways,
the core operation of the two languages is largely similar as can be seen from the
Chapter 5. Formalization 115
similarities in the formulations of FJ’s [60] and Fruja’s proofs [42]. While some work
would have to be done to adapt the language proofs cited below for use in C], the
language similarities suggests that this should not be an overly onerous task.
At its core, the hierarchical single-owner system I have adopted in my proposal was
first proposed by Clarke, Potter, and Noble [28]. The type system supporting this
original formulation was quite strict [41]. Subsequent work by a number of authors has
helped to relax some of these restrictions. My proposal is most closely related the JoE
research language and its subsequent derivatives [27, 24]. JoE employs a hierarchical
single owner system [27] with a notation quite similar to my proposed system. The key
properties I rely on, namely ownerships being immutable once assigned and ownership
contexts forming a tree hierarchy, are characteristic of all these systems.
Ownership Generic Java (OGJ) combines Java generics with ownership annotations in
a concise manner [95]. The final formulation presented in Potanin’s thesis includes full
language proofs which support the language properties I require as discussed in the
introduction. My use of C] generics with ownerships is similar to OGJ in a number of
ways. Like OGJ, Zal allows parent types and implemented interfaces on type declara-
tions to be annotated with ownership information. The handling of static fields in OGJ
is also similar to that in Zal. Unlike OGJ, Zal requires an explicit list of context pa-
rameters on type declarations in addition to the ownership annotations on super-types
and is purely syntactic in nature. C] does allow generic types to be parameterized by
primitive types, quite unlike Java. Since I elected to treat value and reference types or-
thogonally, extending Potanin’s proofs would be relatively straight forward and would
be a mechanical exercise.
Allowing ownerships to be resolved at a sub-object level of granularity has been pro-
posed in a number of Ownership Types systems including JoE [27] and Ownership
Domains [3]. The use of explicitly declared subcontexts is most similar to the explicit
declaration of domains in Ownership Domains [3] which is another well proven system.
The use of constraints on ownership context parameters on types and methods has
previously been proposed in the literature. The domination operator (<) was first
formulated in JoE [27]. The disjointness operator (#) was extensively discussed in the
multiple ownership MOJO system [25]. Both JoE and MOJO employ a hierarchical
116 Chapter 5. Formalization
ownership system and so these two different formulations could be easily combined.
Overall, none of the ownership specific language features I have proposed as part of
Zal is novel from an Ownership Types perspective. What is new and different is the
combination of language features and their combined use to reason about data depen-
dencies and inherent parallelism. Because the language features themselves are all well
proven in isolation, combining these proofs to prove the language properties I require is
a largely mechanical, if tedious and time consuming, process. Proving the correctness
of the annotation systems I have created could then be done using Wright and Felle-
sien’s syntactic approach [126] since these extended language features are, for the most
part, syntactic sugar.
5.1.2 Effect Rules
I am not the first to propose capturing side-effects of evaluation in terms of ownership
contexts. Previously systems such as JoE [27], Effective Ownership Types [73], and
Boyapati Lee and Rinard’s system for preventing data races and deadlocks [20] have
all employed some form of ownership effect annotation. These systems have all shown
how to construct an effect system that is both sound and complete. The style of effect
system that I have proposed is quite similar to that found in JoE and Boyapati et
al’s work. The major difference in my formulation is that I employ disjoint read and
write effect summaries without having a context read implied by a context write. This
does not materially change the effect system itself and so the proofs employed in these
previous systems could be easily adapted to work on Zal to prove the properties I require
hold. Specifically, proving that all read and write effects are captured when generated
and are not subsequently lost during effect computation of compound expressions is a
largely mechanical process.
5.2 Proof of Ancestor Tree Search Algorithm
Given that the type system I have proposed ensures that object owners are fixed once
allocated and that the ownership contexts form a tree, I now proceed to prove the
correctness of the runtime context disjointness test (#). In this section, I formally define
Chapter 5. Formalization 117
context disjointness, prove that context disjointness implies effect disjointness and,
finally, prove the correctness of the simple parent pointer chasing algorithm described
in Section 3.4.4 for runtime disjointness testing. The other algorithms I proposed, using
Dijkstra views and Skip Lists, are optimizations of this basic algorithm.
To help with the creation of a formal definition of the disjointness test, I first define a
helper function, ancestors, as shown in Figure 5.2.
Ancestors Contexts:
ancestors(
Γ, l1)
= l1, ancestors(
Γ,Γ(l1))
ancestors(lw
)= ∅
Figure 5.2: The helper function, ancestors used to test for context disjointness. Notethat Γ represents the type checking environment and Γ(l1) obtains the parent of contextl1.
Definition 5.1 (Context Disjointness).
l0#l′0 iff l0 /∈ ancestors(l′0
)and l′0 /∈ ancestors
(l0
).
Theorem 5.2 (Context Disjointness Implies Effect Disjointness).
If two contexts l0#l′0, then when l0 and l′0 are named as effects, the areas of memory
represented by the contexts when named as effects are disjoint.
Proof. Given that ownership contexts form a tree hierarchy [28]. If l0#l′0, then they lie
on different branches of the tree. By the structure of a tree we know that the subtrees
rooted at the context nodes, which is the scope of the effect, cannot overlap since each
object has exactly one owner. Therefore, to show that two contexts named as effects are
disjoint, it is sufficient to show that the two contexts are disjoint from one another.
Theorem 5.3 (Static Context Relations).
The world context is a dominator of all other contexts. An object of type C∣∣ k ∣∣ has
owner k1 and k1 dominates the this context.
Proof. To prove the universal domination of world, we use the fact that the world
context is an ancestor of all contexts, and so, by the definition of context disjointness,
118 Chapter 5. Formalization
any disjointness test against world will always fail. Similarly, we know that the this
context is always a child of its owner k1, and so they are never disjoint.
Having formally defined disjointness and proven the equivalence of context and effect
disjointness, I now prove the correctness of the algorithm. To do this, it is necessary
to demonstrate that, if b is an ancestor of a, then following the parent pointers from a
will find context b. Since ownership contexts in my proposed system form a tree and
ownership information cannot change allows the proof to be done by induction.
Figure 5.3: The base case for the ancestoralgorithm inductive proof: a and b are thesame node.
Figure 5.4: The induction step of the an-cestor algorithm proof: b is a parent ofthe nth parent of a.
The base case is shown in Figure 5.3, where a and b are the same context. No pointers
need to be followed and b has been found in 0 steps from a. Now assume, as the
induction hypothesis, that for an ancestor b, n levels removed from a, that b can be
found by following parent pointers in n steps. It now remains to prove that, for an
ancestor b, n + 1 steps removed from a, that b can be reached by following parent
pointers from a in n+1 steps, as illustrated in Figure 5.4. By the induction hypothesis,
we know that we can get from a to the nth ancestor of a in n steps by following the
parent pointers. Following the parent pointer from the nth ancestor of a arrives at the
n+ 1th ancestor of a, b in n+ 1 steps, concluding the proof of the traversal algorithm.
By performing the walk from both of the contexts being tested for disjointness, both
the case of a being an ancestor of b and b being an ancestor of a are handled. This
means that if a parent-child relationship exists between the contexts, this relationship
will be found. If such a relationship does not exist, then the two contexts must be
disjoint.
Chapter 5. Formalization 119
This algorithm will always terminate as any program has a finite number of objects
and there are no cycles in the ownership hierarchy.
5.3 Proof of Condition Correctness
Having presented an argument to lend credence to the type system underpinning my
reasoning being sound, the effect system sound and complete, and the runtime disjoint-
ness test proposed in Section 3.4.4, I conclude by proving that the sufficient conditions
for parallelism I previously stated in Chapter 3 are, indeed, sufficient for their purposes.
Whenever part of a computer system, be it hardware or software, allows program op-
erations to be reordered, at runtime, relative to the order specified by the programmer,
there should be some kind of consistency model guaranteed by the transformational
process. The consistency model provides a specification of what changes can be made
to the execution order of the program and, consequently, what invariants will be pre-
served by the transformation.
These consistency models and the preservation of program meaning are important for
enforcing determinacy. Denning and Dennis provide an eloquent definition for determi-
nacy, “Determinacy requires that a network of parallel tasks in shared memory always
produces the same output for given input regardless of the speeds of the tasks” [36].
They also eloquently describe why it is important, “It tells us we can unleash the full
parallelism of a computational method without worrying whether any timing errors or
race conditions will negatively affect the results” [36].
When automatically applying a parallelizing code transformation to a sequential pro-
gram, it is generally desirable to ensure that the transformation satisfies Bernstein’s
Conditions for parallelism. Bernstein’s Conditions, as originally defined, state that two
sub-routines S1 and S2 can be safely executed in parallel provided that:
• IN(S1) ∩OUT (S2) = ∅
• OUT (S1) ∩ IN(S2) = ∅
• OUT (S1) ∩OUT (S2) = ∅
120 Chapter 5. Formalization
where IN(S) is the set of memory locations used by S and OUT (S) is the set of memory
locations written to by S [15]. All parallelizing transformations which guarantee that
these conditions are satisfied will produce the same results as the original sequential
program. There are specific cases where less strict conditions may suffice, but for
program correctness, in the general sense, to be preserved, these conditions must be
satisfied.
The Bernstein’s Conditions ensure that there are no dependencies between the code
blocks to be run in parallel. There are two categories of dependencies which may exist
in a program: control dependencies and data dependencies. If these dependencies are
preserved when the program is parallelized then the determinacy of the program is
preserved.
In all of the sufficient conditions I have proposed, I have stated that no control depen-
dencies which would prohibit parallelization exist in the code being analyzed. Existing
systems can be used to reason about and prove the absence of such control dependencies
and so I do not address control dependencies in this section; they are declared not to ex-
ist. This section focuses on proving that when the sufficient conditions I have proposed
are satisfied, the appropriate parallelizing transformation can be safely applied.
The sufficient conditions I have proposed focus on ensuring no data dependencies are vi-
olated when parallelizing code, and I prove the correctness of these sufficient conditions
in this section.
Before beginning the detailed proofs of correctness, it is important to define what a
data dependence is. I quote this definition from Goff, Kennedy, and Tseng [46]:
We say that a data dependence exists between two statements S1 and S2 if
there is a path from S1 to S2 and both statements access the same location
in memory. There are four types of data dependence:
• True (flow) dependence occurs when S1 writes a memory location
that S2 later reads.
• Anti dependence occurs when S1 reads a memory location that S2
later writes.
Chapter 5. Formalization 121
• Output dependence occurs when S1 writes a memory location that
S2 later writes.
• Input dependence occurs when S1 reads a memory location that S2
later reads.
Of these four types of data dependency, only three are usually considered when data
dependencies are discussed in relation to reordering code transformations: flow, output,
and anti dependencies. Input dependencies do not restrict reordering of operations and
so are, by common convention, ignored.
The first step of building the proofs of the sufficient conditions for parallelism is to link
the effects computed by my effect system to the data dependencies. The first step is
to define what it means for two sets of effects to be disjoint
Definition 5.4 (Disjoint Effects).
If e and e′ are well-typed expressions and ` e : ϕ and ` e′ : ϕ′, then effects ϕ are said to
be disjoint from effects ϕ′ (written as ϕ#ϕ′) where ϕ = 〈r, w|x, y〉, ϕ′ = 〈r′, w′|x′, y′〉
when r#w′, w#r′, w#w′, x#y′, y#x′, and y#y′.
Having defined effect disjointness, I now prove, in Theorem 5.5, that when two expres-
sions have disjoint effects then no data dependencies can exist between them .
Theorem 5.5 (Disjoint Effects Imply No Data Dependencies).
If e and e′ are well typed expressions such that e is evaluated prior to e′, and ` e : ϕ
and ` e′ : ϕ′ with ϕ#ϕ′, then there are no flow, output, or anti dependencies between
e and e′.
Proof. Let ϕ = 〈r, w|x, y〉 and ϕ′ = 〈r′, w′|x′, y′〉. The proof of this theorem is now done
by contradiction, considering the three cases of flow, output, and anti dependencies.
Begin assuming, by way of contradiction, that there is an output dependency between
e and e′. If that dependence is via a field f on an object owned by context c, then
by side-effect information being sound and complete we know that c ∈ w and c ∈ w′.
However, we know from ϕ#ϕ′, w#w′, which provides the contradiction. If the output
122 Chapter 5. Formalization
dependence is via a variable then both must write to a variable x and, again, by side-
effect information being sound and complete, x ∈ y and x ∈ y′. However, we know
from ϕ#ϕ′, y#y′ which provides the contradiction.
Now assume, once again by way of contradiction, that there is a flow dependence
between e and e′. If that dependence is via a field f on an object owned by context c,
then by effect soundness and completeness c ∈ w and c ∈ r′. However we know from
ϕ#ϕ′, w#r′, which provides the contradiction. If the flow dependence is via a variable
x, then by effect soundness and completeness c ∈ y and c ∈ x′. However, we know
from ϕ#ϕ′, y#x′, which provides the contradiction and concludes the case. A mirror
argument may be made to prove that anti dependencies do not exist, which concludes
the proof.
I now prove that it is sufficient to preserve only flow, output, and anti dependencies to
preserve sequential consistency.
Lemma 5.6 (Update Dependency Preservation Sufficient for Parallelization with Se-
quential Consistency).
In the absence of control dependencies which would prohibit parallelization, if two code
blocks S1 and S2 have no flow, output, or anti dependencies between them, then they
can be safely executed in parallel.
Proof. For S1 and S2 to execute safely in parallel in my proposed system, the version
executed in parallel must be sequentially consistent with the original implementation.
If an output dependence exists between S1 and S2 with each writing a value to a field
f on an object o and they execute in parallel, then the final value of f depends on
whether S1 or S2 writes to f first. This race could cause the sequential consistency
model to be violated and so to guarantee sequential consistency we need to ensure no
output dependencies exist. Mirror arguments can be made for the need to prohibit flow
and anti-dependencies.
Assume now, by way of contradiction, that a flow dependence exists between S1 and
S2, with S1 writing a value to a field f on an object owned by context c, later read
by S2, and that they execute in parallel. If S1 writes a value later read by S2 the
Chapter 5. Formalization 123
value read by S2, depends on the order in which S1 and S2 run. This could violate the
sequential consistency guarantee, so it is sufficient to ensure that no flow dependencies
exist. A mirror argument can be constructed for anti dependencies.
Finally, if an input dependence exists between S1 and S2, the value of the location read
does not change. Hence, the reads can be performed in either order without violating
sequential consistency if there are two reads. Which concludes the proof.
A number of the disjointness operations in the following proofs operate on sets of con-
texts named as effects. I now define context set disjointness to simplify the presentation.
Definition 5.7 (Context Set Disjointness).
If l#l′ then ∀l ∈ l(∀l′ ∈ l′ l#l′
).
The rest of this section argues for the correctness of the sufficient conditions.
5.3.1 Task Parallelism
The first and simplest parallelization pattern I discussed was task parallelism — the
execution of two blocks of code in parallel. Two blocks of code, S1 and S2 can be
safely executed provided no control dependencies, which would prohibit parallelization,
exist between them and no data dependencies, in the form of flow, output, or anti-
dependencies, exist between them. I previously stated that it was sufficient to show:
• Task Condition 1 — no control dependencies exist which would prohibit par-
allelization.
• Task Condition 2 — the contexts read by statement S1 are disjoint from the
contexts written by statement S2
• Task Condition 3 — the contexts written by statement S1 are disjoint from the
contexts written by statement S2
• Task Condition 4 — the contexts written by statement S1 are disjoint from the
contexts read by statement S2
I now formalize these conditions and argue that they are indeed sufficient.
124 Chapter 5. Formalization
Theorem 5.8 (Task Parallelism Condition Sufficiency).
Given ` S1 : 〈r, w|x, y〉 and ` S2 : 〈r′, w′|x′, y′〉, then if r#w′, r′#w, w#w′,x∩y′, x′∩y,
and y ∩ y′, S1 can be run in parallel with S2 provided there are no control dependencies
which would prohibit parallelization between S1 and S2.
Proof. Assume that c is a context where c ⊆ w and c ⊆ w′. Given that each context
has a single owner, it must be the case that either c ⊆ wi ⊆ w′j or c ⊆ w′
j ⊆ wi for
some valid i and j. Figure 5.5 shows the relationships between c, wi, and w′j . From
the theorem we know wi#w′j , for all valid i and j, which is a contradiction, thus there
cannot be a heap output dependence between S1 and S2. The absence of heap flow
and anti dependencies can be proved with mirror arguments using w with r′and r with
w′, respectively. Stack locations have unique names and so the lack of overlap between
stack locations read and written ensures that there are no stack data dependencies.
Therefore, by Lemma 5.6, having proved the absence of data dependencies and assumed
a lack of control dependencies, the sufficient condition’s correctness is proved.
Figure 5.5: The relationship between contexts c1, c2, and object x.
5.3.2 Data Parallelism
A loop can be safely parallelized, provided no data or control dependencies exist be-
tween iterations. In Section 3.5.2, I stated the following sufficient conditions for the
safe parallelization of a simple foreach loop of the form shown in Listing 5.1.
foreach (C|k| e in collection)e.operation();
Listing 5.1: The simple stereotypical data parallel foreach loop for which sufficientconditions for parallelization were developed.
Chapter 5. Formalization 125
The sufficient conditions stated were:
• Loop Condition 1: there are no control dependencies which would prevent loop
parallelization,
• Loop Condition 2: the elements enumerated by the iterator supplying elements
to the loop body must be disjoint, and
• Loop Condition 3: the operation’s write set contains only the representation
of the element on which is invoked. It does not read the representation of any
other elements in the collection although it can read data from other disjoint
contexts.
I now formalize these conditions and prove they are sufficient for their stated purpose.
Theorem 5.9 (Data Parallelism Condition Sufficiency).
Consider a foreach loop of the form shown in Listing 5.1. Let e1...en represent the
elements returned by the iterator to be processed by the loop. It is sufficient to show
that:
1. there are no control dependencies which would prohibit parallelization
2. ∀i, j ∈ 1...n, i 6= j ⇒ ei 6= ej
3. T operation() reads <r> writes <w> where ∀w ∈ w w ≤ this ∧ ∀r ∈ r(r ≤
this ∨ r#k1
)for the loop iterations to be safely executed in parallel.
Proof. The proof is done by contradiction and the argument proceeds in a similar way
to that presented for the proof of correctness of the task parallelism sufficient conditions.
Assume, by way of contradiction, that an output dependence exists between iterations.
Let e1 and e2 be two separate elements in e1...en such that e1.operation() writes to
a field of an object in context x and e2.operation() writes to that same field on that
same object in context x.
126 Chapter 5. Formalization
From hypothesis condition 3, w may contain at most the this. This means that
e1.operation() can write only to e1 or contexts strictly dominated by e1. Similarly,
e2.operation() can write only objects that are either e2 or strictly dominated by e2.
By hypothesis condition 2, we know e1 6= e2. Figure 5.6 shows this set of relationships.
Figure 5.6: The relationships between e1, e2, and x as used in the proof of effectdisjointness.
From this point, the proof proceeds by the same argument made to prove Task Con-
dition 2. If x is contained in both e1 and e2, then it must be the case that, either e1
dominates e2 or e2 dominates e1. However, because e1 and e2 have the same owner,
k1, e1 6= e2. Since object owners cannot be changed and ownership contexts form a
tree structure, we know that e1 6= e2 ⇒ e1#e2. Since e1#e2, there can be no context x
which is part of both e1 and e2 since each context has a single owner, which provides
the contradiction.
Assume now, by way of contradiction, that a flow dependence exists. That e1...en must
contain two distinct elements e1 and e2 such that e1.operation() writes to a field of
some object in context x and e2 reads from the same field of that same object in x.
There are two possible locations for x since a context read, r, may be either dominated
by this or disjoint from k1.
If r is dominated by this then we need to prove that there is no x that is contained in
both e1 and e2, which we have just done. If, on the other hand, r is disjoint from e1,
we must show that there is no x such that x is disjoint from e1 and dominated by e2
Chapter 5. Formalization 127
Figure 5.7 shows the relationships of e1, e2, r, and x.
Figure 5.7: The relationships of e1, e2, r, and x and the disjointness of k1 and r for theproof of effect disjointness.
If x is dominated by both e1 and r then it must be the case that r is dominated by
e1 or that e1 is dominated by r. However, from hypothesis condition 3, we know that
e1#r by r#k1 and we know that ownership contexts form a hierarchy. Together, this
provides the contradiction. A mirror argument can be made to prove the absence of
anti dependencies. Having, therefore, proved that no flow, output, or anti-dependencies
can exist when the conditions are met, the safety of parallelizing this loop follows from
Lemma 5.6 which concludes the proof.
5.3.3 Pipeline
It now remains to prove that the algorithm presented in Chapter 3 for constructing a
Data Dependency Graph (DDG) correctly identifies all of the inter- and intra-iteration
loop dependencies. The DDG construction algorithm as originally presented in Sec-
tion 3.5.3:
1. Consider all statements in the loop body pairwise (including each statement with
itself):
(a) Add an inter-iteration dependency if a flow, output, or anti-dependency
exists between the two statements via any context which is not dominated
128 Chapter 5. Formalization
by the element’s this context or any stack variable declared outside the loop
body’s scope.
(b) Otherwise, if the two statements are different, add an intra-iteration depen-
dency if a flow, output, or anti-dependency exists between the two state-
ments via a context dominated by or equal to this context or any stack
variable declared inside the loop body’s scope.
Before proceeding to prove that all the inter- and intra-iteration dependencies are
correctly identified by the DDG construction algorithm, I first present an example of
such a loop to refine the nature of the pipeline being considered. This example is shown
in Listing 5.2. I also present a formalized version of the algorithm. As was the case
with data parallelism, arbitrary pipeline stages can be transformed into the standard
form shown in Listing 5.2. Note that the behavior of the stack can be modelled using
objects and so an explicit stack is omitted from the formalization.
foreach(T|o| e in collection) {e.op 1(...);e.op 2(...);...e.stage n(...);
Listing 5.2: An example of the style of loop intended for pipelining
Theorem 5.10 (Pipeline Condition Sufficiency).
Let e1 . . . em be the set of objects in collection to be processed by the pipeline,
op1 . . . opn be methods on all of the objects e1 . . . em, outer be the set of stack variables
read and written by the loop, inter be the set of inter-iteration dependencies as tuples
of stages and intra be the set of intra-iteration dependencies as tuples of stages. Each
stage has heap read effects rj, heap write effects wj, stack read effects xj and stack
write effects yj.
1. ∀i, j ∈ 1 . . . n(wi ∩ rj
)∪
(ri ∩ wj
)∪
(wi ∩ wj
)*
{this
}∨( (
yi ∩ xj
)∪
(xi ∩ yj
)∪
(yi ∩ yj
) )∩ outer 6= ∅
⇐⇒(stagei, stagej
)∈ inter
Chapter 5. Formalization 129
2. ∀i, j ∈ 1 . . . n i 6= j∧(∅ 6=
(wi ∩ rj
)∪
(ri ∩ wj
)∪
(wi ∩ wj
)⊆
{this
}∨( (
yi ∩ xj
)∪
(xi ∩ yj
)∪
(yi ∩ yj
) )− outer 6= ∅
)⇐⇒
(stagei, stagej
)∈ intra
Proof. Intra-iteration dependencies are a special case of inter-iteration dependencies.
All dependencies are inter-iteration dependencies unless the dependency exists entirely
on iteration unique state.
Assume that for two operations i and j in the loop, that an output intra-iteration de-
pendency exists between the stages. The proof of the correctness of these classifications
now proceeds by contradiction. Assume by way of contradiction that an inter-iteration
dependency exists between the i and j operations. There are two cases for the depen-
dency to consider:
Case (Stack Dependence): If the inter-iteration dependence is caused through a
stack location, that stack location must be declared outside the scope of the loop. How-
ever from step 2 of the algorithm above, the outer variables are explicitly excluded when
checking for intra-iteration dependencies by( (yi∩xj
)∪
(xi∩yj
)∪
(yi∩yj
) )−outer 6=
∅ in the rule. This provides the contradiction and the inter-iteration dependence cannot
exist through the stack.
Case (Heap Dependence): If the inter-iteration dependence is caused through a heap
location, that heap location must be accessible to multiple loop iterations. However
from step 2 of the algorithm above, intra-iteration dependencies can exist only through
the representation of the data element being processed by(wi∩rj
)∪
(ri∩wj
)∪
(wi∩
wj
)⊆
{this
}portion of the rule. Because all of the elements in the collection share
the same owner and are not equal, then their representations must be disjoint which
provides the contradiction, and the inter-iteration dependence cannot exist through the
heap.
Mirror arguments can be made to show that no inter-iteration flow or anti-dependencies
can exist when the algorithm identifies an intra-iteration dependence. This concludes
the proof.
130 Chapter 5. Formalization
5.4 Summary
In this chapter I have argued for the correctness of the Zal language based on previous
Ownership Types work. I have also sketched the proofs of the correctness of the suffi-
cient conditions I proposed for exploiting inherent task, data, and pipeline parallelism.
These results provide rigor and precision to support my proposals.
Chapter 6
Implementation
In Chapter 4, I presented the design of the Zal programming language — C] version 3
extended with the Ownership Types based realization of my ideas from Chapter 3. Be-
fore I can use Zal in Chapter 7 to validate my ideas by applying them to representative
sample applications, I needed to implement a compiler for the language. The design
and implementation of my Zal compiler and its associated runtime ownership tracking
system is the focus of this chapter.
The main contribution of this chapter is the detailed design of my compiler and runtime
system. I chose to use the Gardens Point C] (GPC]) [32] research C] compiler as a
basis for my compiler implementation. The resulting Zal compiler is a source-to-source
compiler in the tradition of CFront [112]. This means that the compiler reads in Zal
programs and produces C] source code. I, therefore, had to implement all of Zal’s
semantics in C]. There are numerous small technical contributions throughout this
chapter as I present the design of the compiler and how it implements Zal’s semantics.
Readers not interested in these technical details can safely skip this chapter.
6.1 Background
Once the need to implement a “compiler” had been identified, the first major decision
was choosing whether to extend an existing C] compiler or to write one from scratch.
After studying the problem, it was decided to proceed with writing a new C] compiler
from the ground up. This thesis project is just one of many language research projects
131
132 Chapter 6. Implementation
and there is a common need to be able to experiment with new language features
and type system extensions. Such language research can be targeted at both general
purpose programming as well as more domain specific applications. The goal in writing
a new C] compiler was to create a new piece of research infrastructure which could be
used in these different projects. To facilitate this, it was necessary to modularize the
compiler design so that compilation steps could be combined as desired and extended
easily where necessary. Examples of this modularity include the declarative grammar
used to describe the language and the strict separation of phases as will be discussed
shortly.
We decided to implement a source-to-source compiler rather than producing Common
Intermediate Language (CIL) byte-code directly from the compiler. This simplified the
implementation of the compiler and made it easier to debug the compiler output. The
compiler produces C] code and new language features added in research languages are
implemented in C] during the code generation phase. The modular design means that
a byte-code generator could be written at a future date when it becomes necessary for
the research being undertaken.
The base compiler reads in a C] version 3.0 source file and produces a C] version 3.0
source file after type checking the program. Compiler extensions modify the input
language and define the implementation of new language features in the C] code pro-
duced. The compiler has been tentatively named the GPC] compiler and is available
online [32].
6.2 Implementation Attribution
Before continuing to describe the design and implementation of the compiler and run-
time system I need to acknowledge and thank my supervisor Dr. Wayne Kelly and
Prof. John Gough for all the work they did helping to write the GPC] compiler. Their
assistance was invaluable in getting this project completed on-time. I refactored the
GPC] compiler so that I could then extend it with my proposed ownership type and
effect system. This refactoring included significant changes to the handling of type
parameters and parts of the type checking and resolution processes. The design and
Chapter 6. Implementation 133
implementation of the ownership extensions to the compiler and the associated runtime
libraries are entirely my own, but would not have been possible without the joint effort
to create the GPC] compiler.
Table 6.1 shows the relative sizes of the Ownership Extensions and the unmodified
Gardens Point C] compiler. Note that the core GPC] compiler statistics include code
written to provide the extension points required to implement the ownership exten-
sions. The creation of the extension points was non-trivial. The code required to
implement the extension points was generally easy to write, but required careful design
consideration and so consumed approximately 25% of the total development time of
the extensions.
Metric Zal Compiler Total GPC] Extensions Extensions (% total)SLOC-P 39,444 27,288 12,156 30.8%SLOC-L 22,201 14,957 7,244 32.7%
McCabe VG 6,152 4,248 1,904 30.9%
Table 6.1: Measures of the relative sizes of the Ownership Extensions and the GPC]compiler in terms of physical lines of code (SLOC-P), logical lines of code (SLOC-L),and cyclomatic complexity (McCabe VG) [76].
6.3 Design
As previously stated, the ultimate goal of writing our own C] compiler was to create a
research tool which could be used to experiment with language design including new lan-
guage features, type system extensions, and program analysis techniques implemented
on top of an existing industrial language. To facilitate this, the compiler design was
heavily componentized. The basic C] compiler was created and bootstrapped before I
began to create the compiler extensions needed to compile Zal.
In this section, I discuss the details of these different compilation steps and how I added
extension points to allow support of Zal to be added in a modular manner. Figure 6.1
shows the different operations performed by the compiler and the names of the methods
invoked to perform the operation.
134 Chapter 6. Implementation
Figure 6.1: A diagram showing the different parts of the compiler we have written. TheZal only operations are shown in white boxes; these steps are skipped by the normalC] compiler.
6.3.1 Scanner & Parser
We decided to write our compiler in C]. There are several reasons for this choice:
existing research group familiarity with the language and associated tools, possibility of
bootstrapping the compiler as an interesting test case, consistency between the language
being compiled and the language of the compiler, and the advanced language features
of C] which we exploited in the compiler design as will be discussed.
There are a number of scanning and parsing tools available today, but few of them
produce C] code. Coco/R is a popular parser generator which comes in versions for
many languages including C]. Coco/R accepts an attributed LL(k) grammar, for an
arbitrary k greater than 1, and produces a scanner and parser for the specified lan-
guage [125]. The Coco/R generated scanner was replaced with one generated using the
Gardens Point Scanner Generator (GPLex) [48]. The C] language specification allows
unicode characters to appear in the source of C] programs. The Coco/R generated
Chapter 6. Implementation 135
scanner could not handle unicode characters and so was replaced to make the scanner
and parser compliant with the C] language specification [78]. This substitution required
minor modification to the Coco/R file templates, but no modification to Coco/R itself.
When creating the scanner and parser for my ownership type and effect extensions, the
goal was to isolate the scanner and parser for the extensions from that of the GPC]
compiler. Zal uses the same symbols as C] so the GPLex scanner specification did
not need to be modified. A separate grammar definition is used to produce the Zal
parser, but the grammar is mostly the same as the standard C] grammar. The modified
grammar produces the same basic Abstract Syntax Tree (AST) as the GPC] parser,
but Zal ASTs may contain some extra nodes and some nodes have extra fields and
methods added to track additional information. The modular design of the extension
allows the core GPC] compiler to still be compiled from the same code base without
modification.
The Coco/R grammar specification [125] consists of a sequence of production defini-
tions. Each production rule is specified using an EBNF expression as well as semantic
actions to execute as the EBNF expression is matched. The EBNF grammar allows
optional matching by placing portions of the EBNF expression between []s and op-
tional multiple matches using {}s. Semantic actions can be associated with any part
of the EBNF grammar and they are listed between (. .)s. Every production may
have a number of input and output attributes to pass information to the production of
a non-terminal in a production and return values from the production to its caller.
There were three main productions added: formal context parameters on type and
method definitions, actual context parameters on type constructions, and effect decla-
rations on invokable code blocks such as methods, property accessors, and constructors.
These main productions were added to existing language productions to implement the
language features as previously discussed in Chapter 4.
Listing 6.1 shows the grammar production rule which recognizes a list of actual context
parameters. A list of actual context parameters consists of at least one context param-
eter listed between ||s. The listed contexts are stored as a list of context parameters.
Listing 6.2 shows the grammar rule for the list of formal context parameters production.
136 Chapter 6. Implementation
actualContextList<. out UnresolvedContextActuals cl .> =(. cl = null; List<string> l = new List<string>(); .)(
("|" (. string i; .)contextName<out i> (. l.Add(i); .){
"," (. string i2; .)contextName<out i2> (. l.Add(i2); .)
}"|" (.cl = l.Count > 0 ? new UnresolvedContextActuals(l):null;.))
).
Listing 6.1: COCO/R grammar production for a list of actual context parameters inZal.
Like the rule for the list of actual context parameters shown in Listing 6.1, the list
of formal context parameters consists of at least one context parameter name listed
between []s. The parameters are stored as a list of ownership contexts which can then
be attached to the appropriate declaration.
formalContextList<. out ContextFormals cf .> =(. List<OwnershipContext> cl = new List<OwnershipContext>(); .)"[" (. string i; Token tok0 = la; .)ident<out i> (. cl.Add(new OwnershipContext(tok0,i,false)); .){"," (. string i2; Token tok1 = la;.)ident<out i2> (. cl.Add(new OwnershipContext(tok1,i2,false)); .)
}"]" (. cf = cl.Count > 0 ? new ContextFormals(cl) : null; .)
.
Listing 6.2: Coco/R grammar production for a list of formal context parameters in Zal.
Lastly, Listing 6.3 shows the grammar production for a set of declared read and write
effects. As was the case with context parameters, the declared effect sets are stored as
lists and attached to the appropriate declaration.
Apart from the productions for formal and actual context parameters as well as effect
declarations, the production for the foreach loop was modified to accept the features of
the enhanced foreach loop described in Section 2.4.1.2. Listing 6.4 shows the modified
foreach loop production; notice the option ref keyword and the optional portion of
the rule which recognizes the at <index> notation just before the "in" recognition.
Chapter 6. Implementation 137
effectList<out UnresolvedEffects e> =(. List<string> reads = null; List<string> writes = null; .)[IF (IsReads()) identifier (. reads = new List<string>(); .)"<"[ (. string i; .)contextName<out i> (. reads.Add(i); .){","contextName<out i> (. reads.Add(i); .)
}]
">"][IF (IsWrites()) identifier (. writes = new List<string>(); .)"<"[ (. string i; .)contextName<out i> (. writes.Add(i); .){","contextName<out i> (. writes.Add(i); .)
}]
">"] (. e = new UnresolvedEffects(reads ?? new List<string>() {"world" },writes ?? new List<string>() {"world" }); .)
.
Listing 6.3: Coco/R grammar production for a set of declared read and write effects.
6.3.2 Abstract Syntax Tree
The Abstract Syntax Tree (AST) is a tree representation of a program. The nodes
in the tree represent declarations, statements, and expressions. The design of the
AST in GPC] employs principles from aspect-oriented programming. As an example,
consider the AST node for an if statement. The IfStatement class derives from the
Statement class which in turn derives from the Node class. The IfStatement contains
constructors, properties, a type checking method, and a code generation method. To
help make the source easier to navigate and maintain, groups of semantically related
AST nodes are implemented in the same source code file. Further, we have used C]’s
partial classes to break the implementation of classes up into different source files by
the aspect. These aspects correspond to AST node construction, type checking, and
code generation. These different aspect specific source files are organized into sub-
directories based on the compilation stage they are associated with. This means, for
example, that all of the TypeCheck methods for statements are located in the same
subdirectory of the source tree. The key advantage of this is that when working on
the compiler, programmers do not need to sift through code unrelated to the phase or
138 Chapter 6. Implementation
foreachStatement<out Statement s> = (. TypeRef t; Token tok0 = la; .)"foreach" "(" (. bool refVar = false; .)["ref" (. refVar = true; .)]type<false, out t> (. string i; .)ident<out i> (. TypeRef indexType = null; string index = null; .)[ IF (IsAt()) identifiertype<false, out indexType>ident<out index>]"in" (. Expression e; .)expression<out e> ")" (. Statement s2; .)embeddedStatement<out s2>(. s = new EnhancedForeachStatement(tok0, t, i, e, s2, indexType,
index, refVar); .).
Listing 6.4: The modified Coco/R production for the enhanced foreach loop in Zal.
pass they are working on. Figure 6.2 shows a portion of the compiler source file system
structure.
Figure 6.2: An illustration of the Zal compiler source directory structure where classimplementation is split across source files using partial classes stored in subdirectoriesfor each stage of compilation.
The language extensions I have proposed involve several new syntactic features. These
syntactic features need to be represented as part of the AST and so several new AST
node types had to be added. The AST nodes added are listed in Table 6.2. The
implementation details of these different AST nodes are the focus of the remainder of
this section.
Chapter 6. Implementation 139
Node Type DescriptionContextFormals list of formal context parameters
(OwnershipContexts) attached to a type defini-tion
UnresolvedContextActuals a list of context names attached to a type referencewhich should resolve to OwnershipContexts duringtype checking
OwnershipContext a formal context parameter on a definition whichmay optionally have constraints in the form of aConstraintList
ConstraintList a set of constraints on a context parameterSubcontextDefinition a subcontext declared in a type definitionUnresolvedEffects a list of context names read and written attached to
a definition which needs to be resolved to form a setof ResolvedEffects during effect computation
ResolvedEffects a list of OwnershipContexts read and written pro-duced as a result of resolving the contextnames listedin an instance of UnresolvedEffects
EnhancedForeachStatement an enhanced foreach loop, derived from the exist-ing ForeachStatement, which stores the additionaloptional def keyword and index expressions
Table 6.2: The AST nodes added to represent the contexts, context constraints, effectdeclarations, and enhanced foreach loops.
6.3.3 Design of the Pluggable Type System
Starting with version 2, the C] language added support for generics. Adding type
parameters to type definitions and type references requires extensive compiler support
for checking and validation. The compiler infrastructure required to check and validate
generic type parameters is very similar to the infrastructure required to check and
validate other kinds of parameters on types including context parameters.
The term Pluggable Type System was first proposed by Bracha [12]. Bracha uses the
term to mean a system which supports the implementation of a number of optional type
systems. These optional type systems can be used to prove program properties. In his
definition, Bracha specifically mandates that an optional type system must not affect
the runtime semantics of the programming language to which it is applied. Generic
types are one notion of a parameterized type; generic types are types parameterized
with types. Abstracting the framework for generics in the C] language can, therefore,
provide a framework for the implementation of a pluggable type system on top of C].
It is important to note that the implementation of ownership types I have chosen does
140 Chapter 6. Implementation
not strictly adhere to Bracha’s requirement that a optional type system not affect
the runtime semantics of the language it is applied to. The handling of statics in
conjunction with ownerships is discussed in Section 4.4. The value read from a static
field is affected by the context parameters supplied. None-the-less, abstracting the type
parameter framework so that it can be used to implement both generics and my type
and effect system would produce a framework that could also be used to implement a
truly pluggable type system.
When we were writing the C] compiler (see Section 6.2), the overall design of the plug-
gable type system infrastructure was not known. Generics were initially implemented
in the compiler. The implementation of generics was then refactored several times dur-
ing the process of adding context parameters to types. During these refactorings, areas
of common functionality were extracted and abstracted. When behavior specialization
of abstracted functionality was required, extension points were added to existing in-
terfaces to allow for this. The end result is the design presented in this section. The
presentation of the design which follows mirrors the development process which took
place; this structure makes the design process and choices made easier to explain. I
begin by discussing the implementation of generics and then abstracting from generics
the infrastructure to support arbitrary type parameters.
6.3.3.1 Generic Types
Generic types are types that are parameterized with one or more formal type param-
eters. The actual types which can be supplied for a given formal type parameter may
be constrained using a constraint clause. Listing 6.5 shows the declaration of a generic
type and the use of a generic type.
public class OrderedList<T> where T : IComparable<T> {...
}...OrderedList<int> intList = new OrderedList<int>();
Listing 6.5: An example of the declaration and use of generic types in C].
In the example shown in Listing 6.5, the type OrderedList has a single formal type
parameter T. The where clause on the OrderedList type stipulates that any type
Chapter 6. Implementation 141
supplied must for T must implement the IComparable interface. The local variable
declaration of intList constructs a type reference to the OrderedList with an actual
type parameter of int.
Figure 6.3 shows the AST subtree generated for the class declaration shown in List-
ing 6.5. There are five types of AST nodes which appear in Figure 6.3. Brief descriptions
of these node types are shown in Table 6.3.
Node DescriptionStructuredTypeDefinition Represents a user defined type such as a class, struct,
enum, or interface.GenericFormals A list of formal context parameters which is
attached to a StructuredTypeDefinition orMethodDefinition.
TypeParameter a formal context parameterTypeParameterConstraint A constraint on a formal type parameter; there may
be multiple constraints on a single parameter.UnresolvedGenericActuals A list of TypeReferences to be used as actual type
parameters to be bound to the formal type param-eters of the invoke expression or type reference towhich it is attached.
TypeReference A name reference to a type definition; this is whatmost programmers would usually think of as a typename in an informal manner.
Table 6.3: Descriptions of the AST node types which appear in Figure 6.3.
Formal generic type parameters on a definition are represented by TypeParameter
nodes stored in a GenericFormals node attached to the definition. When a type is con-
structed, as happens in the type constraint clause on the type definition shown in List-
ing 6.5, the constructed type is represented by a TypeReference node. TypeReferences
are resolved to TypeDefinitions during type checking. Actual type parameters sup-
plied on a TypeReference are represented by a list of TypeReference nodes stored in
an UnresolvedGenericActuals node attached to the original type reference.
6.3.3.2 Extracting an Abstract Type Parameter Infrastructure
With the discussion of how generics were initially implemented in the compiler com-
plete, I now proceed to discuss how the framework was abstracted to support arbitrary
type parameters. There are two key AST nodes in the implementation of generics in the
compiler: GenericFormals, UnresolvedGenericActuals, and ResolvedGenericActuals.
142 Chapter 6. Implementation
Figure 6.3: This figure shows the AST subtree generated by our C] compiler for theclass declaration shown in Listing 6.5.
Chapter 6. Implementation 143
Conceptually GenericFormals represent a list of formal type parameters, each with an
associated list of type constraints and UnresolvedGenericActuals represent a list of
actual type parameters. ResolvedGenericActuals is used to store the definitions that
actual context parameters resolve to during type checking. This is fundamental to the
operation of the generics implementation even though it is not shown on the figure.
The nodes which represent the formal and actual parameters themselves, rather than
the lists, are not as important because all of the resolution and validation operations
on a list of parameters are performed by the list holding the parameters. Making this
distinction ensures that any AST nodes desired can be used as parameters to types;
the abstraction will apply only to the containers for these parameters.
To abstract the parameterization of types, it was, therefore, necessary to abstract the
three node types outlined above. The next question was how to abstract these node
types. One option would be to abstract all three node types as C] abstract classes, but
this would limit the flexibility of design since parameters could not inherit from other
classes such as Node. With this in consideration, the decision was taken to abstract the
actual lists of parameters using C] interfaces to maximize flexibility. The constraints
are a rather more distinct entity in the AST and so I chose to abstract them using an
abstract class initially as converting the abstract class into an interface at a later date
would be trivial. Table 6.4 shows the abstract class and interfaces I wrote to abstract
the implementation of type parameters.
Type DescriptionITypeFormals Represents a list of formal parameters on a type defini-
tion.IUnresolvedTypeActuals Represents a list of actual parameters on a type refer-
ence before their being resolved.IResolvedTypeActuals Represents a list of actual parameters on a type refer-
ence after they have been resolved to their definitions.
Table 6.4: The interfaces written to provide an abstract structure for the implementa-tion of type parameters.
In addition to the abstractions above, there needs to be infrastructure added to allow
types to be parameterized with multiple different types of parameters. For example,
when my ownership extensions are in use, a type may be parameterized with both type
and context parameters.
144 Chapter 6. Implementation
Without additional infrastructure, different parameters on a type or method would need
to be aware of the other types of parameters that may also be present on the type.
To create a maximally flexible framework that would allow parameters to be added
in a modular manner, there needs to be some additional infrastructure to abstract
the parameters’ implementations and associated algorithms from one another. This
infrastructure would give each set of parameters the appearance of being the only
parameters on a type. The abstraction could be violated if necessary, but would simplify
implementation of the most common case.
To achieve the desired abstraction, I wrote three classes to store lists of formal param-
eters, unresolved actual parameters, and resolved actual parameters. Table 6.5 shows
the types for these amalgamated sets of parameters. These amalgamated parameter
implementations implement the standard parameter interfaces shown in Table 6.4. The
implementations simply invoke the same interface method on all of the parameter lists
contained in the amalgamation. Figure 6.4 shows the AST subtree for the type def-
inition shown in Listing 6.5 with the addition of these amalgamated type parameter
wrappers.
Type DescriptionAmalgamatedFormals Wraps a list of ITypeFormals; that is it is a list
of formal type parameter lists.AmalgamatedUnresolvedActuals Wraps a list of IUnresolvedTypeActuals; that
is a list of unresolved actual type parameters ona type reference.
AmalgamatedResolvedActuals Produced by resolving the type paramter lists inan AmalgamatedUnresolvedActuals.
Table 6.5: The classes used to wrap up lists of type parameters so that different pa-rameter lists do not need to be aware of one another and so that parameters can bechecked and resolved collectively.
Having created these wrappers, deciding what methods needed to be implemented
by these abstractions was deduced while implementing ownership context checking and
validation. The consistency and correctness of type references are determined when the
type reference is resolved to a type definition. Once the type reference has been resolved
to a type definition, the definition can be checked for assignment compatibility with
other types to ensure the overall correctness and consistency of the program. There are
a number of methods used to do this, but the key methods are summarized in Table 6.6.
Chapter 6. Implementation 145
Figure 6.4: This figure shows the AST subtree generated by our C] compiler for theclass declaration shown in Listing 6.5 after the amalgamated type parameter wrappershave been added.
146 Chapter 6. Implementation
Listing 6.6 shows the full declaration of the interfaces showing the functionality exposed.
6.3.4 Effect Calculation and Validation
The Zal compiler executes two passes over the Abstract Syntax Tree (AST) to compute
effects: one to compute heap effects and one to compute stack effects. These two effect
computation passes require different computation contexts and so are most logically
implemented separately. The first pass computes all of the heap side-effects of all the
expressions and statements in the program and validates the method effect signatures
against the computed effects of their respective bodies. The second computes the stack
effects for all the statements and expressions in the program.
6.3.4.1 Heap Effects
Heap effect computation is implemented recursively over the nodes of the AST. Each
node has a computeEffects method. This method recursively calls the computeEffects
on any child statements or expressions in the AST. Once that is done, the method com-
putes the overall effect for the node and stores it in the node’s sideEffects variable.
The caching of effects on the nodes is done so that effects do not need to be repeatedly
recomputed as different data dependency analyses are performed on the AST.
computeEffects takes three parameters: the current scope (used to resolve names and
types not cached during type checking when required), an effect computation environ-
ment which contains context relationship information as well as a reference to the type
definition the current code is in, and a boolean flag used to indicate if an expression is
being read or written (the flag is ignored when the effects of non-expressions are being
computed). The computeEffects methods returns the overall effect of executing the
subtree of the AST rooted at the node it is invoked on.
Listing 6.7 shows the computeEffects method for a BlockStatement. The block
statement effect computation computes the side-effects of all the statements it contains
and unions them together to produce the total effect of the statement block as shown
by the formal type rule in Figure 6.5. The condition checking for the Rewriting flag
Chapter 6. Implementation 147
Method DescriptionResolveNameHere Used to lookup formal type parameters as part of the
process of resolving unresolved references.UnfixedParameters Part of type parameter inference on method calls.InferredParameters Part of type parameter inference on method calls.SameActualsAs Used to test for type equality between two types pa-
rameterized with type parameters.CompatibleActualsWith Used to test for assignment compatibility between two
types parameterized with type parameters.Resolve Called to convert lists of unresolved parameters into
resolved parameters.
Table 6.6: Key methods from the interfaces used to abstract type parameters to createa pluggable type system.
public partial interface ITypeParameters {string mangle(string name);
}public partial interface ITypeFormals: ITypeParameters {
void TypeCheck(IScope scope, OverflowChecks check);void DefineNames();Definition ResolveNameHere(string name);TypeContainer ResolveTypeOrNamespaceHere(string name);IUnresolvedTypeActuals Unresolve();bool NotSpecified();IResolvedTypeActuals UnfixedParameters();IResolvedTypeActuals InferredParameters();IResolvedTypeActuals actuals {get; set; }bool SameActualsAs(ITypeFormals other);bool CompatibleActualsWith(ITypeFormals other);ITypeFormals Clone();List<ITypeFormals> FormalsOmittingInferrables();GenericFormals GetTypeParam();bool InstanceNeeded(IResolvedTypeActuals actuals);void CheckValid(IScope scope, Token pos);ResolvedEffects ResolveEffects(IScope scope);OwnershipContext GetOwner();
}public partial interface IUnresolvedTypeActuals: ITypeParameters {
IResolvedTypeActuals Resolve(IScope scope);IUnresolvedTypeActuals Clone();
}
public partial interface IResolvedTypeActuals : ITypeParameters {}
Listing 6.6: The key interfaces used to abstract the different kinds of type parameterlists and the methods declared on them.
148 Chapter 6. Implementation
is for the conceptual re-writing discussed in Section 6.3.4.3.
Γ; ∆ ` s : ϕ Γ; ∆ `{s}
: ϕ′
Γ; ∆ `{s; s
}: ϕ ∪ ϕ′ (Eff-Block)
Figure 6.5: The effect rule for a statement block.
public override ResolvedEffects computeEffects(IScope scope,OwnershipEnv env, bool writing) {ResolvedEffects sideEffects = EffectsFactory.createEmptyEffect();if (statementList != null)
foreach (Statement s in statementList)sideEffects = sideEffects.Union(env,
s.computeEffects(this, env, false));
return sideEffects;}
Listing 6.7: The method for computing the heap effect of a statement block.
The source of all heap effects are reads and writes of fields. A NameExpression
is the AST node which represents a field or variable name. Listing 6.8 shows the
computeEffects methods for NameExpressions. The method determines what the
name refers to and from that determines what the effect of reading that name should
be.
6.3.4.2 Stack Effects
Stack effects are computed recursively over the AST using the method LocalEffects
in a similar manner to the computation of heap effects using computeEffects. The
computation of stack effects is, generally, simpler than heap effects since contexts can
be aliased and can dominate one another while local variable names are unique and are
simply tracked as a set of variable names read and written. The most complex effect
computation steps are for member expressions and statement blocks.
Member expressions must account for the reading and writing of structs allocated on
the stack. The this context of these structs stored in the stack needs to be mapped to
an appropriate stack effect. The stack location read or written has a unique name, the
name of the local variable which stores it. If the struct were stored in a heap allocated
Chapter 6. Implementation 149
public override ResolvedEffects computeEffects(iscope scope, OwnershipEnvenv, bool writing) {
ResolvedEffects sideEffects = EffectsFactory.createEmptyEffect();if (d == null)
throw new SemanticError(pos, 0, "Can’t resolve name 0", name);
else if (d is FieldDefinition) {if (((FieldDefinition)d).Has(Modifiers.Readonly) ||
(!d.Has(Modifiers.Static) &&((FieldDefinition)d).outerTypeScope.kind == Kind.Struct))sideEffects = EffectsFactory.createEmptyEffect();
else if (!d.isStaticScope) {if (writing)
sideEffects = sideEffects.Union(env, new ResolvedEffects(new ResolvedContextActuals(new List<OwnershipContext>()),new ResolvedContextActuals(new List<OwnershipContext>() {
((FieldDefinition)d).computeContext(scope,env) })));else
sideEffects = sideEffects.Union(env, new ResolvedEffects(new ResolvedContextActuals(new List<OwnershipContext>() {
((FieldDefinition)d).computeContext(scope, env) }),new ResolvedContextActuals(new List<OwnershipContext>())));
} else {if (writing) {
sideEffects.WriteEffects.parameters.Clear();sideEffects.WriteEffects.parameters.Add(newOwnershipContext(null, ((NamedTypeRef)((FieldDefinition)d).outerTypeScope.ToTypeRef()).ToString(), true));
} else {sideEffects.ReadEffects.parameters.Clear();sideEffects.ReadEffects.parameters.Add(newOwnershipContext(null, ((NamedTypeRef)((FieldDefinition)d).outerTypeScope.ToTypeRef()).ToString(), true));
}}
else if (d is PropertyDefinition) {if (writing)
sideEffects = raiseEffects(env,((PropertyDefinition)d).ResolveSetEffects(),((PropertyDefinition)d).outerTypeScope, env.Rewriting);
elsesideEffects = raiseEffects(env,
((PropertyDefinition)d).ResolveGetEffects(),((PropertyDefinition)d).outerTypeScope, env.Rewriting);
}else
sideEffects = EffectsFactory.createWorldEffect();return sideEffects;
}
Listing 6.8: The computEffects method of the NameExpression AST node.
150 Chapter 6. Implementation
partial class MemberExpression {public override Effect<string> LocalEffects(IScope scope, bool writing)
{Effect<string> toReturn = e.LocalEffects(scope, false);if ((writing &&e.t is StructuredTypeDefinition &&((StructuredTypeDefinition)e.t).Kind == Kind.Struct) ||(Binding is MethodDefinition &&
((MethodDefinition)Binding).outerTypeScope.Kind == Kind.Struct &&((MethodDefinition)Binding).ResolveEffects().WriteEffects.parameters.Contains(ResolvedContextActuals.thisContext)))toReturn.Union(e.LocalEffects(scope, true));
return toReturn;}
}partial class NameExpression {
public override Effect<string> LocalEffects(IScope scope, bool writing){
Effect<string> effects = new Effect<string>();Definition d = name.ResolveNameAnywhere(scope);if (d is LocalDefinition || d is StructuredTypeDefinition) {
if (writing)effects.AddWrite(name.i);
elseeffects.AddRead(name.i);
}return effects;
}}partial class BlockStatement {
public override Effect<string> LocalEffects(IScope scope) {Effect<string> effects = new Effect<string>();
if (statementList != null) {foreach (Statement s in statementList)
effects.Union(s.LocalEffects(this));foreach (string varName in LocalsList()) {
effects.GetReadEffect().Remove(varName);effects.GetWriteEffect().Remove(varName);
}}return effects;
}protected List<string> LocalsList() {
List<string> locals = new List<string>();foreach (Statement s in statementList)
if (s is LocalVarDefStatement)locals.AddRange(((LocalVarDefStatement)s).GenSet());
return locals;}
}
Listing 6.9: The implementations of the local effect computation methodsLocalEffects on member and name expressions as well as blocks statements.
Chapter 6. Implementation 151
object, the this effect becomes the this context, a heap effect, of the containing object.
If it is nested inside another struct the effect continues to be propagated “up” until
either a heap allocated object or a local variable is reached. Listing 6.10 shows an
example of this. There are no heap allocated objects in the example and so there are
no heap effects. Notice that the stack effects calculated for the expressions computing
the rectangle height and width read only the r1 stack variable.
public struct Point {public int x;public int y;public Point(int x, int y) reads<> writes<> {
this.x = x; this.y = y;}
}public struct Rectangle {
public Point topLeft;public Point bottomRight;public Rectangle(Point topLeft, Point bottomRight) reads <> writes <> {
this.topLeft = topLeft;this.bottomRight = bottomRight;
}}...Point p1 = new Point(1,1); // writes p1Point p2 = new Point(3,4); // writes p2Rectangle r1 = new Rectangle(p1,p2); // reads p1,p2 writes r1int width = r1.bottomRight.x - r1.topLeft.x; // reads r1int height = r1.bottomRight.y - r1.topLeft.y; // reads r1
Listing 6.10: An example of the mapping of struct this contexts to stack variablesduring stack effect computation
Statement blocks remove local variables, whose scope is the block, from their effect
sets; this is done due to the lexical scoping of variables. The variables declared in a
block are internal details of the block, code outside of the block scope cannot access
the variables declared within the block. The removal of these variables from the block’s
effect set hides the block’s implementation details.
Listing 6.9 shows the implementation of the member expression, name expression and
block statement LocalEffects methods.
6.3.4.3 Loop Body Rewriting
The sufficient conditions developed for data parallel loops in Section 2.4.1.1, applied
to a simple foreach loop where a single method is invoked on the loop’s iteration
152 Chapter 6. Implementation
variable. To handle arbitrary loop bodies, I presented the idea of a conceptual rewriting
whereby the loop body was transformed into a method on the iteration variable. When
I presented this idea I emphasized that this was a conceptual rewriting and, that
in practice, the same results could be obtained by modifying the effect computation
algorithms. In this section I present how this conceptual rewriting was implemented in
the Zal compiler.
Heap effects are computed recursively over the AST using the computeEffects method.
The computeEffects method takes three parameters: the current scope of evaluation
(used to resolve names and types), an OwnershipEnv environment object, and a bool
flag which determines if the current expression is being used as an l-value or r-value. The
OwnershipEnv object is used to carry information down through the effect computation
process. It carries several different pieces of information:
• The current type being processed
• The current subroutine construct being evaluated (method, constructor property
accessor, etc)
• Context relationship information
• The current rewrite variable, if any
The OwnershipEnv is the key to the conceptual rewriting process. Unless an effect
computation for a conceptual rewriting is in process, the current rewrite variable is
unset. When a conceptual rewrite is needed, the following changes are made to a copy
of the current OwnershipEnv:
• set the rewrite variable type and name to that of the iteration variable
• clear any context relationships from the environment that pertain to the current
this context and add the element type’s owner as a dominator of this.
The OwnershipEnv method has a read-only property called Rewriting which tests if
the rewrite variable information has been set. This modified OwnershipEnv object is
then passed to the computeEffects method of the loop body. All of the expressions
which add contexts to the read and write effect sets check this property to see if a
Chapter 6. Implementation 153
rewrite operation is in progress. If a rewrite is in progress, the following changes are
made to the way effects are computed:
• Reads of fields via the this variable (directly or indirectly) cause a read of the
current type’s owning context, if there is one.
• Writes to fields via the this variable (directly or by implication) cause a write of
the current type’s owning context, if there is one.
• Reads of fields via the rewrite variable cause a read of the this context.
• Writes of fields via the rewrite variable cause a write of the this context.
• When a method is invoked, if it is invoked via the rewrite variable then the
this context is not replaced with the object owner, otherwise the this context is
replaced with the owner of the type on which the method was invoked, if there is
one. This is done since the this context is nameable only within the object whose
representation it holds.
This set of rules basically ensures that the effect set computed for the loop body is
expressed in terms of effects from the perspective of the iteration variable, just as
would be the case if the loop body was actually rewritten to be a method on the
iteration variable.
This same rewriting process can be used to compute the side-effects of extension meth-
ods since the effect of an extension method should appear to be effects on the extension
parameter object as was discussed in Section 4.3.2.7. There is one interesting corner
case to consider when conducting a rewrite of a loop in a struct, a value type. The
struct does not have an owner so the abstraction of reads and writes of fields via the
this variable cannot be abstracted to reads and writes of the type’s owning context.
To avoid missing dependencies created via the fields on the current value type, there
needs to be a check that the loop body does not write to the this context when the loop
body effects are computed normally. If such a write takes place, then a loop carried
dependence could exist which is prohibited by my sufficient conditions.
Note that user defined value types do not have owners. Since a rewrite of the loop hides
these effects, before parallelizing loops a check needs to be made in the user defined
154 Chapter 6. Implementation
value types that they do not write the this context when their effects are computed
normally. If they do, there is a loop carried dependence which is prohibited by my
sufficient conditions.
Chapter 6. Implementation 155
6.4 Zal Code Generation
I have written several libraries of data structures and helper methods which are used
by the Zal compiler’s code generation stage to facilitate the implementation of Zal
programs in C]. The most important of these data structures and methods are those
which facilitate runtime tracking of ownership and the implementation of types an-
notated with runtime ownership information. In addition to this core functionality, I
have also written libraries to support the implementation of enhanced foreach loops
in C] and the pipelining of foreach loops. This section discusses how the compiler im-
plements ownerships in C] during the BuildOwnershipImplementation pass and the
design of the runtime libraries used to support this implementation. It also discusses
other runtime libraries written to simplify other implementation steps performed by
the compiler.
6.4.1 Runtime Ownership Implementation and Tracking
Reasoning about inherent parallelism using effects expressed using contexts requires
there to be some basis for determining if two arbitrary contexts are disjoint. While
there are a number of language features that can be used to facilitate static reasoning
about this disjointness (see Section 3.4), there are still a number of cases where it is
desirable to perform runtime tests on the relationship between two arbitrary contexts
as was discussed in Section 3.4.4. This means that when programs written in Zal are
compiled to C], additional fields and methods need to be added to the emitted code to
track ownership at runtime. In this subsection I present the design and implementation
of this runtime ownership tracking system as well as that of the runtime libraries used
to facilitate this tracking and context relationship testing at runtime.
6.4.1.1 Ownership Implementation
The tracking of ownerships at runtime can be achieved through the addition of methods
and fields to the classes emitted by the Zal compiler. This implementation approach
allows ownership tracking to be implemented without modifying the Common Language
Runtime (CLR) although this may be preferable.
156 Chapter 6. Implementation
Because Zal is implemented in C] it is desirable to be able to tell which types in
a program support ownership tracking and which do not. To do this I introduced
the IOwnership interface. All classes emitted by the Zal compiler implement this
interface. This means objects modified to support the runtime tracking of ownerships
can be distinguished from those which do not by testing to determine if the IOwnership
interface is implemented. In cases where the interface is not implemented, the runtime
system uses safe default ownership and effect information.
The methods in the IOwnership interface are determined by the data structure being
used to store ownership information and provide the minimum functionality required
to traverse the ownership tree, and test contexts for disjointness as was discussed in
Section 3.4.4. In Section 3.4.4, several different structures were discussed including
simple parent pointers, Dijkstra views, and skip lists. Out of these different structures,
I have chosen to implement two ownership tracking systems. One of these implemen-
tations uses Dijkstra Views, where each object keeps an array of references to all of its
ancestors back up to the special top context world. The other uses the simple parent
pointer system where each object maintains a pointer to its parent. To help improve
the performance of runtime tests in the simple parent pointer system, I have elected to
cache the depth of each object in the ownership hierarchy on the object. This allows
a best and worst case complexity of O(n −m) where m and n are the depths of the
objects being compared.
To reduce the amount of code that needs to be emitted into each class and facilitate
easy modification of the ownership tracking system I chose to expose all necessary ob-
ject ownership tracking functionality through the IOwnership interface. This interface
allows library code to be used for most of the functionality with only accessor and mu-
tator methods required on the objects themeselves. I have created two versions of the
IOwnership interface, one for each of the two different systems I have implemented.
The Zal compiler can be toggled to produce C] code conforming to either of these
interfaces; the choice of which to use is made based on the ratio of object creations to
context relationship tests as I discuss later in this chapter.
The Dijkstra views implementation has the advantage of providing constant time run-
time tests for the relationship between contexts, but the disadvantage of incurring
Chapter 6. Implementation 157
object creation time and memory usage overheads proportional to the height of the
ownership tree. Objects tracking ownership information using Dijkstra views, there-
fore, needed to provide some means of (1) initially setting the array of ancestor contexts,
(2) getting the array of ancestors, (3) get the object’s immediate owner, and (4) the
current depth of the object in the ownership tree. Listing 6.11 shows the methods
of the Dijkstra View’s version of the IOwnership interface. Notice that the arrays of
ancestors operate on arrays of objects not arrays of IOwnership types. This is so that
ordinary C] objects can be made to supply the default owner context of world for the
purposes of effect calculation.
public interface IOwnership {object[] GetParents();void SetParents(object[] value);object Owner();int Depth();
}
Listing 6.11: The IOwnership interface implemented by all types emitted by the Zalcompiler when Dijkstra Views based tracking is selected.
The parent pointer based runtime system has the advantage of minimizing the memory
and object creation overheads incurred at the expense of slower context relationship
tests. In the system, objects must provide some means of (1) initially setting their
parent pointer and (2) getting their parent pointer. As an optimization, as previously
discussed, objects can also provide their depth in the ownership hierarchy. Listing 6.12
shows the methods of the parent pointer version of the IOwnership interface. As was
the case with the Dijkstra Views implementation, notice that the parent is an object
and not an instance of IOwnership.
public interface IOwnership {object Owner();void SetOwner(object value);int Depth();
Listing 6.12: The IOwnership interface implemented by all types emitted by the Zalcompiler when Dijkstra Views based tracking is selected.
In addition to implementing one of these IOwnership interfaces, classes emitted by the
Zal compiler also have fields added to store any additional context parameters declared
on the Zal class. No additional methods are required to manipulate these fields since
158 Chapter 6. Implementation
they are used only within the class’s implementation.
For the sake of brevity and clarity, the remainder of the implementation will be de-
scribed using the pointer based version of the IOwnership interface. The Dijkstra
Views based implementation can be trivially derived from the presented code and is
also available for download from [32].
So that the methods of the IOwnership interface can be called on any object, I wrote a
class of extension methods which call the appropriate ownership methods depending on
the type of the objects passed in. There is separate handling of IOwnership instances,
arrays, and non-IOwnership instances. The special handling of arrays is discussed
in detail in Section 6.4.1.4. Listing 6.13 shows the details of this class of extension
methods.
Once a Zal program has been compiled, other Zal programs may link against that
program to make use of one or more parts of its functionality. To facilitate reasoning
across compiled code modules, it is desirable to preserve formal context parameter and
effect declarations in an easily accessible format. This lets the Zal compiler read this
information from these compiled programs so that it can be used when compiling other
programs. In C], custom attributes provide one means of storing such information def-
initions. Table 6.7 lists the custom attributes I have written to store context parameter
and effect information on definitions. These attributes are emitted in the C] source
code produced by the Zal compiler.
In addition to the custom attributes, I have also written a class of extension methods
which are used to reduce the amount of code that the compiler needs to programmati-
cally inject into the generated code. Two items in this library are used as part of the
ownership tracking process and these are shown in Listing 6.14. The OwnershipWorld
type referred to in Listing 6.14 implements IOwnership and an instance of this type
is used to represent the special top context world at runtime. The depth of world is
always 0 and as the top context it does not have any parents and so returns an empty
array when asked for its parents and null when asked for its owner.
When an object is instantiated, actual context parameters are supplied for the type’s
formal context parameters. Context relationship testing may involve these context
parameters so they need to be captured as part of the object’s representation. To
Chapter 6. Implementation 159
public static class ObjectOwnershipExtensions {public static object Owner(this object obj) {
IOwnership ownedObj = obj as IOwnership;if (ownedObj != null)
return ownedObj.Owner();Array arrayObj = obj as Array;if (arrayObj != null)
return arrayObj.Owner();return OwnershipHelpers.world;
}
public static void SetOwner(this object obj, object owner) {IOwnership ownedObj = obj as IOwnership;if (ownedObj != null) {
ownedObj.SetOwner(ownr);return;
}Array arrayObj = obj as Array;if (arrayObj != null) {
arrayObj.SetOwner(owner);return;
}throw new OwnershipException("Cannot transfer the ownerhip of a
non-IOwnership non-Array type");}
public static int Depth(this object obj) {IOwnership ownedObj = obj as IOwnership;if (ownedObj != null)
return ownedObj.Depth();Array arrayObj = obj as Array;if (arrayObj != null)
return arrayObj.Depth();return 1;
}
public static object AddChild(this object o, object toAdd) {toAdd.Owner(o);return o;
}}
Listing 6.13: Extension methods on object which allow ownership properties to beread from any object and set on any object which supports it as implemented for theparent pointer version of the IOwnership interface.
160 Chapter 6. Implementation
Attribute UsageFormalContextParametersAttribute Stores a list of formal context parameters
declared on a type or method definition.ReadEffectAttribute Stores the declared read effects of a
method, constructor, or delegate defini-tion.
WriteEffectAttribute Stores the declared write effects of amethod, constructor, or delegate defini-tion.
GetReadEffectAttribute Stores the declared get accessor read ef-fects of a property or indexer.
GetWriteEffectAttribute Stores the declared get accessor write ef-fects of a property or indexer.
SetReadEffectAttribute Stores the declared set accessor read ef-fects of a property or indexer.
SetWriteEffectAttribute Stores the declared set accessor write ef-fects of a property or indexer.
Table 6.7: Table of custom attributes used to store declared context parameters andeffect information. These attributes are emitted into C] source code produced by theZal compiler.
public static class OwnershipHelpers {private static OwnershipWorld w;public static OwnershipWorld world {
get {if (w == null)
w = new OwnershipWorld();return w;
}}
}
Listing 6.14: The items in the OwnershipHelpers class which are used to facilitate theimplementation of ownership tracking.
do this, the compiler adds a private field to store each context parameter to the type
definition. The Zal compiler emits a constructor which supports the runtime tracking of
ownership and, optionally, an unmodified constructor for use by existing C] programs.
The ownership tracking version adds parameters to the constructor’s signature for
each of the type’s context parameters; context parameters are passed as objects at
runtime. When a new object instantiation is emitted by the compiler, it supplies the
actual context parameters as additional parameters to the constructor. The unmodified
version of the constructor for use by existing C] programs leaves all of the context
parameter fields set to their default initial value of the top context world. Finally, the
declared context parameters are stored on the emitted type definition using a custom
Chapter 6. Implementation 161
attribute.
Listing 6.15 shows a small example class written in Zal which has some formal context
parameters and a set of declared read and write effects on the constructor. It also shows
how the example class is implemented in C] by the Zal compiler using the previously
mentioned libraries, interfaces and code injection techniques.
// Zal class declaration
public class ExampleOwnershipClass[owner, data] {public ExampleOwnershipClass(int value) reads <> writes <> {
...}
}// C] implementation of the above class declaration
[FormalContextParameters("owner", "data")]public class ExampleOwnershipClass : IOwnership {
private IOwnership Context owner;private IOwnership Context data;[ReadEffect(), WriteEffect()]public ExampleOwnershipClass(int value) {
Context owner = OwnershipHelpers.world;Context data = OwnershipHelpers.world;
OwnershipHelpers.world.AddChild(this);...
}[ReadEffect(), WriteEffect()]public ExampleOwnershipClass(int value, IOwnership Context ownerparm,
IOwnership Context dataparm) {Context owner = Context ownerparm;Context owner.AddChild(this);Context data = Context dataparm;
...}
}
Listing 6.15: The implementation of a Zal class, with context formal context pa-rameters and declared constructor effects, in C] using custom attributes and theOwnershipHelpers library of helper methods.
Method formal context parameters are transformed into call parameters and local vari-
ables by the Zal compiler. As was the case with type constructors, the compiler can emit
two versions of all methods annotated with formal context parameters. One method
has the same list of call parameters as the original method declaration. This method
adds local variables at the start of the method body which correspond to the declared
method formal context parameters, if any. This is done so that even if the method
is called from code that is not ownership aware, ownership aware code called by the
method is supplied reasonable default ownership information. The second version of
the method generated adds the method formal context parameters as call parameters
162 Chapter 6. Implementation
so they can be supplied at method invocation time. An example of this implementa-
tion is shown in Listing 6.16. It is interesting to note that because no constructors are
declared for the MethodExample class, the compiler emits a default constructor so that
the class’s formal context parameters can be handled as previously discussed.
// An example of a method with formal context parameters written in Zalpublic class MethodExample[owner] {
public void operation[contextParm](Object|contextParm| value)reads <this, contextParm> writes <> {...
}}// The above example as implemented in C] by the Zal compiler[FormalContextParameters("owner")]public class MethodExample : IOwnership {
private IOwnership Context owner = OwnershipHelpers.world;
[ReadEffect(), WriteEffect()]public MethodExample(IOwnership Context ownerparm) : base() {
Context owner = Context ownerparm;}[FormalContextParameters("contextParm")][ReadEffect("this", "contextParm"), WriteEffect()]public void operation(Object value, IOwnership Context contextParm) {
...}...
}
Listing 6.16: The implementation of a Zal method with declared formal context pa-rameters and effects, in C] using custom attributes and OwnershipHelpers.
It is important to note that when constructors or methods declare side-effects, the
compiler will emit ReadEffects and WriteEffects annotations with these declared
effects. Even when constructors and methods do not declare their effects, the compiler
emits the effect annotations with the computed read and write effects of the constructor
or method. This is done so that the compiler can read in effect information from the
Zal DLLs during compilation rather than trying to recompute the effects.
6.4.1.2 Properties & Indexers
As was discussed in Section 4.3.2.1 and Section 4.3.2.2, properties and indexers are
both syntactic sugar for methods. Unlike methods, neither indexers nor properties
can be parameterized with formal context parameters. However like methods, the Zal
compiler emits either the declared, when present, or computed effects for the get and
Chapter 6. Implementation 163
set accessors in properties and indexers.
C] does not support attributes on accessors and so their effect information needs to be
stored in attributes on the accessor’s containing indexer or property. This means that
an indexer or property may be parameterized by two different sets of read and write
effects, one each for its get and set accessors. To distinguish the effect sets for the get
accessor from those of the set accessor, I have extended the custom attributes used
to store method effects (ReadEffects and WriteEffects). The derived extensions do
not modify the functionality of the originally declared ReadEffects and WriteEffects
attributes, but the different types are used to distinguish the effect declarations. The
GetReadEffects and GetWriteEffects attributes store the side-effects of the get acces-
sor while the SetReadEffects and SetWriteEffects do the same for the set accessor.
Because indexers and properties are not parameterized with formal context parameters,
the effects can be named only by using the formal context parameters available in the
surrounding type definition scope.
Listing 6.17 shows a Zal class with a property that has declared read and write effects
as well as how that class would be implemented by the compiler using the custom
attributes discussed. Notice the four effect attributes which correspond to the read
and write effects of the get and set accessors since C] does not allow accessors to have
attributes. The effect declaration syntax and attributes for indexers are the same as
those shown for the property.
// An example of a Zal class with a property
public class PropertyExample[owner, data] {public int Value {
get reads <this,data> writes <> { ... }set reads <this> writes <data> { ... }
}}// The above example as implemented in C][FormalContextParameters("owner", "data")]public class PropertyExample : IOwnership {
[GetReadEffects("this", "data"), GetWriteEffects(),SetReadEffects("this"), SetWriteEffects("data")]public int Value {
get { ... }set { ... }
}}
Listing 6.17: An example of the implementation of a Zal property in C].
164 Chapter 6. Implementation
6.4.1.3 Sub-contexts
Like other contexts in the system, sub-contexts are represented by objects at run-
time. Because sub-contexts do not correspond naturally with existing objects in the
program, special objects need to be instantiated to represent them. The runtime owner-
ship library provides the SubContext class which implements the IOwnership interface.
Instances of this class store only the array of ancestor references and provide a min-
imal implementation of the interface. Listing 6.18 shows the implementation of the
SubContext class.
public class SubContext : IOwnership {private object parent;
public SubContext(object owner) {owner.AddChild(this);parent = owner;
}
public object Owner() {return parent;
}
public void SetOwner(object owner) {parent = owner;
}
public int GetDepth() {return parent.GetDepth() + 1;
}}
Listing 6.18: The implementation of the SubContext class which is used to representsub-contexts declared in type definitions.
In a type which declares sub-contexts, each sub-context is emitted as an instance field
which is initialized to a new instance of the SubContext class. The sub-context can
then be passed around or used in relationship tests just as with any other context.
Listing 6.19 is an example of a node in a binary tree which has two sub-contexts to
hold the representations of the left and right branches respectively. When compiled to
C], notice that the sub-context declarations become fields which are set to new instances
of the SubContext type. The subcontext objects are then passed to constructors as
with any other context.
Chapter 6. Implementation 165
// a binary tree node implementation in Zal
public class BinaryTreeNode<T>[owner] {subcontexts l, r;private BinaryTreeNode<T>|l| left;private BinaryTreeNode<T>|r| right;private T data;
public BinaryTreeNode(T data) reads <> writes <> {this.data = data;
}public void addLeft(T data) reads <l> writes <> {
left = new BinaryTreeNode<T>|l|(data);}public void addRight(T data) reads <r> writes <> {
right = new BinaryTreeNode<T>|r|(data);}...
}// implementation of the above class in C][FormalContextParameters("owner")]public class BinaryTreeNode<T> : IOwnership {
private IOwnership Context owner = OwnershipHelpers.world;private IOwnership Context l;private IOwnership Context r;private BinaryTreeNode<T> left;private BinaryTreeNode<T> right;private T data;
[ReadEffect(), WriteEffect()]public BinaryTreeNode(T data, IOwnership Context ownerparm) {
Context owner = Context ownerparm;Context l = new SubContext(this);Context r = new SubContext(this);
this.data = data;}[ReadEffect(), WriteEffect("l")]public void addLeft(T data) {
left = new BinaryTreeNode<T>(data, Context l);}[ReadEffect(), WriteEffect("r")]public void addRight(T data) {
right = new BinaryTreeNode<T>(data, Context r);}...
}
Listing 6.19: An example of the use of sub-contexts as part of the implementation of abinary tree node.
166 Chapter 6. Implementation
6.4.1.4 Arrays
All arrays in C] are instances of the System.Array type. This type does not implement
the IOwnership interface since it is supplied as part of the .NET Platform. As discussed
in Section 4.3.3.4, each array in Zal can have an owning context parameter. If no owner
is specified, the array defaults to being owned by the world context. The owner of array
objects need to be tracked at runtime as does any other object with formal context
parameters. Since modifying the implementation of the System.Array type directly
is not an option, I have written a static helper class, ArrayOwnershipExtensions, to
store each array’s ownership information.
The static helper class, ArrayOwnershipExtensions, maintains a dictionary which
maps array objects to their respective owning contexts. To avoid having this helper
class cause memory leaks by holding on to array references, the dictionary used is a
custom written WeakDictionary which stores its keys using weak references which do
not contribute to an object’s incoming reference count for garbage collection purposes.
The helper class also supplies a number of extension methods which add the methods
of the IOwnership interface to the System.Array type. This allows the methods of the
IOwnership interface to be invoked on arrays. Listing 6.20 shows the implementation
of the ArrayOwnershipExtensions class.
public static class ArrayOwnershipExtensions {private static WeakDictionary<Array, object[]> dictionary = new
WeakDictionary<Array, object[]>(new ArrayComparator);
public static void Owner(this Array array, object value) {dictionary[array] = value;
}public static int Depth(this Array array) {
return dictionary.ContainsKey(array) ?dictionary[array].Length : 1;
}public static object Owner(this Array array) {
return dictionary.ContainsKey(array) ?dictionary[array][dictionary[array].Length - 1] :OwnershipHelpers.world;
}}
Listing 6.20: The static ArrayOwnershipExtensions class which stores ownership in-formation for arrays and provides access to this information via extension methods onthe System.Array which provide the same functionality as required of types which im-plement the IOwnership interface as implemented for the parent pointer based system.
Chapter 6. Implementation 167
Listing 6.21 shows an example of an array in Zal which has an ownership parameter
and how that is implemented by the compiler. The special handling of the arrays
shown in Listing 6.13 is necessary because the System.Array type does not implement
IOwnership even though it supplies the methods required. Although extension methods
allow methods to be added to an existing type, they do not allow additional interfaces
to be implemented. The special handling is, therefore, required to invoke the array
versions of the IOwnership methods.
public class ArrayExample[owner] {public Object|owner|[]|this| value;public ArrayExample() reads <> writes <> {
value = new Object|owner|[5]|this|();}
}[FormalContextParameters("owner")]public class ArrayExample : IOwnership {
private IOwnership Context owner = OwnershipHelpers.world;public Object[] value;[ReadEffects(), WriteEffects()]public ArrayExample() {
value = new Object[5]();this.AddChild(value);
}[ReadEffects(), WriteEffects()]public ArrayExample(IOwnership Context ownerparm) {
Context owner = Context ownerparm;value = new Object[5]();this.AddChild(value);
}...
}
Listing 6.21: An example of the creation of an array object in Zal and how the ownershipof the array is set. Notice that the standard AddChild method is called to set theownership of the array. The add child method calls the Object SetParents methodwhich in turn calls the SetParents extension method on the System.Array type.
6.4.1.5 Statics
The handling of static fields, methods and properties is one of the most complicated
aspects of the runtime ownership system. The syntax and semantics of the various
static language features in Zal was discussed in Section 4.4. This section focuses on
how the syntax and semantics of Zal can be implemented in the C] code emitted by
the compiler.
Fields
168 Chapter 6. Implementation
In C] static fields on generic types are specific to the instance of the generic type named
since the generic type parameters can be used to construct the types of static fields.
For consistency, I allowed context parameters on classes to be used to construct the
types of static fields. This means that it is necessary to make static fields specific to
the instance of the type named; a different static field is associated with each set of
actual context parameters.
Because C] does not have any context parameters, it is necessary to implement static
fields on classes with context parameters as methods which accept contexts, in the form
of IOwnership objects, which are then used to lookup or set the value of the static
field for the particular context parameters supplied. All of the different static fields
are stored in a Map where the key is the list of context parameters supplied and the
value is the value of the static field for the given contexts. Because of this choice of
implementation, Zal prohibits the use of static fields with ownership information as
ref parameters since the implementation I have chosen is not compatible. Other more
complex implementations could overcome this limitation.
To help implement static fields, I wrote a special ContextMap class which is a specialized
Dictionary. It accepts a list of context parameters as a key to lookup and set values in
the dictionary. The implementation of the ContextMap class is shown in Listing 6.22.
The key to the implementation is ContextStruct’s override of Equals.
Each static field in a class parameterized with context parameters is converted into a
static ContextMap. Reads of the field are transformed into calls to a static accessor
method which accepts a list of context parameters and returns the field value stored
in the ContextMap for the particular set of context parameters supplied. Similarly,
writes are transformed into calls to a modifier method which accepts a list of context
parameters and a value and it stores the specified value in the ContextMap under the
list of context parameters supplied. An example of the implementation of a static field
is shown in Listing 6.23. In this example, the static field is public so the field is retained
in the implemented version of the code for backwards compatibility reasons, but the
compiler will always use the get and set methods when the field is accessed with actual
context parameters supplied on the type.
Methods
Chapter 6. Implementation 169
public class ContextMap<T> {private struct ContextStruct {
IOwnership[] contextParams;
public ContextStruct(params IOwnership[] contexts) {contextParams = contexts;
}public override bool Equals(object obj) {
if (obj is ContextStruct) {if (contextParams.Length !=
((ContextStruct)obj).contextParams.Length)return false;
for (int i = 0; i < contextParams.Length; ++i)if (!contextParams[i].isSameAs(
((ContextStruct)obj).contextParams[i]))return false;
return true;}return false;
}}private Dictionary<ContextStruct, T> values;
public ContextMap() {values = new Dictionary<ContextMap<T>.ContextStruct, T>();
}public T this[params IOwnership[] contexts] {
get {return values[new ContextMap<T>.ContextStruct(contexts)];
}set {
values[new ContextMap<T>.ContextStruct(contexts)] = value;}
}public bool Contains(params IOwnership[] contexts) {
return values.ContainsKey(new ContextMap<T>.ContextStruct(contexts));}
}
Listing 6.22: The implementation of the ContextMap data structure used to implementstatic fields in classes parameterized by context parameters.
Static methods, like static fields, may make use of the formal context parameters of their
containing type just like instance methods. Instance methods are able to read the type’s
actual context parameters from the instance fields which were initialized when the type
was instantiated. The static methods do not have access to these fields since they are not
associated with a particular instance of the class. This means that the actual context
parameters specified on the type construction on which the static method is invoked
need to passed through to the method body in addition to any method-level context
parameters. Listing 6.24 shows how a static method is implemented in C] by the Zal
compiler. Notice that two versions of the static method are generated. The first version
170 Chapter 6. Implementation
public class Example[owner] {public static Example|owner| val;
}[FormalContextParameters("owner")]public class Example {
public static Example val;private static ContextMap<Example> val value =
new ContextMap<Example>();public static Example get static val(IOwnership Context owner) {
if (! val value.Contains( Context owner)) {return default(Foo);
return val value[ Context owner];}public static void set static val(Example value,
IOwnership Context owner) {val value[ Context owner] = value;
}private IOwnership Context owner = typeBlueOwnershipHelpers.world;public Example(IOwnership Context ownerparm) {
Context owner = Context ownerparm;} ...
}
Listing 6.23: An example of the implementation of a Zal static field in C]. Noticethat because the static field could be accessed from outside the class the plain field isretained for backwards compatibility, but that getter and setter methods are suppliedfor context aware code.
has no context parameters and is implemented for backwards compatibility purposes.
The second has call parameters for the enclosing type’s actual context parameters as
well as any call parameters required for the method’s context parameters.
public static class Example[data] {public static void operation() {
...}
}[FormalContextParameters("data")]public static class Example {
[ReadEffects(...), WriteEffects(...)]public static void operation(IOwnership Context owner) {
...}
}
Listing 6.24: An example of how a static method is implemented in C] when thecontaining type has formal context parameters.
Chapter 6. Implementation 171
Properties
Properties are C] syntactic sugar for accessor and mutator methods which allow the
methods to be invoked using a field style notation. Static properties are, therefore, syn-
tactic sugar for static methods. These static properties, like static methods, can make
use of the context parameters from the enclosing type. As was the case with methods,
these context parameters need to be marshalled into the static “method” by way of ad-
ditional call parameters. Unfortunately, properties do not support explicit parameters.
This means that static methods need to be written to mirror the functionality of the
static property so that the context parameters can be marshalled through. Listing 6.25
shows an example of a static property written in Zal and its implementation in C].
public static class Example[data] {public static Example|data| Instance {
get { ... }set { ... }
}}[FormalContextParameters("data")]public static class Example {[GetReadEffects(...), GetWriteEffects(...)]public static Example get Instance(IOwnership Context data) {...}[SetReadEffects(...), SetWriteEffects(...)]public static void set Instance(Example value,
IOwnership Context data) {...}}
Listing 6.25: An example of the implementation of a Zal static property in C]. Theoriginal property can be optionally retained for use by existing C] programs, but isomitted for clarity from the listing above. The get and set methods are used by own-ership aware code to marshall context parameters to the accessor implementations.
6.4.2 Enhanced Foreach Loop
The foreach loop in C] operates only on collections and other data sources which im-
plement the IEnumerable interface. In Section 2.4.1.2 I proposed an enhanced foreach
loop which exposes the ability to update items in collections being traversed as well
as providing access to an index of traversal which represents where in the collection
the current item being processed is located. Like the traditional foreach loop, the en-
hanced foreach loop can operate only on data sources and collections which implement
a specific interface used to implement the loop during code generation. This section
172 Chapter 6. Implementation
describes the implementation of enhanced foreach loops and the interfaces involved.
The body of an enhanced foreach loop can be thought of as an anonymous method
which accepts as parameters the element being processed and, optionally, the index of
the element being processed. The enhanced foreach loop has two optional features:
the ref keyword which enables write-through updates of the collection element, and
the loop index. This means there are four combinations of these features which produce
four different loop body method signatures as shown in Table 6.8. There are three loop
body delegates which need to be accepted by enhanced foreach loop implementations.
In the case where neither of these optional enhancements is employed, the loop is a
traditional foreach loop and can be implemented as usual without any further special
handling.
ref option index option DelegateX X void Body<T, I>(ref T element, I index)X void BodyNoIndex<T>(ref T element)
X void BodyNoRef<T, I>(T element, I index)void BodyStandard<T>(T element)*
Table 6.8: Enhanced foreach loop body delegates based on the optional enhancementsdeclared in the loop header. *note that the loop body without either of the optionalenhancements is a traditional foreach loop and can be handled using the IEnumerableinterface as usual.
In order for a collection to support the enhanced foreach loop, it needs to implement
at least one of the three methods shown in the list below. Only the methods sufficient
for the specific use of the collection in question are required.
• void EnhancedLoop<T, I>(Body<T, I> body)
• void EnhancedLoop<T>(BodyNoIndex<T> body)
• void EnhancedLoop<T, I>(BodyNoRef<T, I> body)
These methods execute the loop sequentially, and ensure the semantics expected by
the body delegate, with regard to updates to the element being processed. If the ref
keyword is present on the element parameter, assignment to the element passed to the
body delegate should be reflected in the collection.
To support parallel execution of an enhanced foreach loop, there are three additional
Chapter 6. Implementation 173
methods which a collection may implement. As with the EnhancedLoop methods, only
those needed for the specific use of the collection are required.
• void ParallelEnhancedLoop<T, I>(Body<T, I> body)
• void ParallelEnhancedLoop<T>(BodyNoIndex<T> body)
• void ParallelEnhancedLoop<T, I>(BodyNoRef<T, I> body)
These methods should execute the loop in parallel and respect the semantics expected
by the body delegate with regards to element updates.
I provided implementations of all six of these methods for the IList and IDictionary
interfaces as part of the enhanced foreach loop library. In Listing 6.26 I show two of
the EnhancedLoop implementations to show how these methods are implemented for
these collections.
public static void EnhancedLoop<T,I>(this IList<T> collection, Body<T, I>body) {
for (int i = 0; i < collection.Count; ++i) {T tempElem = collection[i];body(reftempElem, i);collections[i] = tempElem;
}}public static void EnhancedLoop<K,V>(this IDictionary<K, V> dictionary,Body<V, K> body) {
foreach (K key in dictionary.Keys) {V value = dictionary[key];body(ref value, key);dictionary[key] = value;
}}
Listing 6.26: Sample implementations of EnhancedLoop for the IList and IDictionaryinterfaces.
The enhanced foreach loop library I have written also supplies two different interfaces
which new collections or data sources may choose to implement (IEnhancedEnumerable
and IIndexedEnumerable). Both of these interfaces have the full set of EnhancedLoop
and ParallelEnhancedLoop methods implemented. The difference between these in-
terfaces is how the index values are generated. Listing 6.27 shows these two interfaces
and Listing 6.28 shows sample of how the EnhancedLoop methods for these interfaces
are implemented.
174 Chapter 6. Implementation
public interface IEnhancedEnumerable<I, T> {IEnumerable<I> GetIndices();IEnumerable<T> GetValues();void SetValue(I index, T value);
}public interface IIndexedEnumerable<T> {
int Start { get; }int End { get; }
T this[int index] {get;set;
}}
Listing 6.27: The two interfaces supplied with the enhanced foreach loop library whichcollections can implement so they can be used with the enhanced foreach loop.
6.5 Parallelization
The goal of the entire type and effect system implemented by the compiler I have written
is reasoning about parallelism. In Section 3.5, I realized sufficient conditions for the safe
application of three different parallelism patterns in terms of context relationships. The
three patterns discussed were task parallelism, data parallelism, and loop pipelining.
In this section, I examine how the compiler tests for the sufficient conditions for these
patterns, how it generates runtime context relationship checks to facilitate conditional
parallelism when required relationships cannot be statically verified, and how and when
the different parallelism patterns are applied.
6.5.1 Context Relationship Testing
When testing the relationship between two contexts at compile-time the result of the
comparison may be one of three results: the relationship is known to hold, the rela-
tionship is known not to hold, or the relationship is unknown. When a parallelization
condition, tested by the compiler, contains at least one relationship test which evaluates
to unknown, it may be desirable to build a set of context relationship conditions, which
if satisfied at runtime, would allow the parallel version of the code to execute safely.
In the compiler, I have constructed a data type, ConstraintList, which stores con-
text relationship information. This data type can be used to store either relationships
which are known to hold, or relationships which would be sufficient to allow for the safe
Chapter 6. Implementation 175
public static void EnhancedLoop<T, I>(this IEnhancedEnumerable<I,T> source,Body<T, I> body) {
IEnumerator<I> indexItr = source.GetIndices().GetEnumerator();IEnumerator<T> valuesItr = source.GetValues().GetEnumerator();while (indexItr.MoveNext() && valuesItr.MoveNext()) {
T temp = valuesItr.Current;body(ref temp, indexItr.Current);source.SetValue(indexItr.Current, temp);
}}public static void ParallelEnhancedLoop<T>(this IIndexedEnumerable<T>source, Body<<T>, int> body) {
Parallel.ForEach(collection.getIndices(), (I index) => {T temp = source[i];body(ref temp, i);source[i] = temp;
});}
Listing 6.28: Sample implementations of the EnhancedLoop method for the li-brary supplied IEnhancedEnumerable and the ParallelEnhancedLoop method forIIndexedEnumerable. The ParallelEnhancedLoop makes use of the Microsoft TaskParallel Library (TPL) parallel foreach loop implementation (see Section 6.5.2.1).
application of a parallelizing code transformation. I chose not to store this informa-
tion within Context objects because relationships between contexts may vary between
different lexical scopes within a user defined type. This design makes handling these
changing relationships more straightforward in the compiler.
Listing 6.29 shows the methods on the ConstraintList. Note that if a relationship
is added to a ConstraintList which would violate an existing constraint then the
relationships is said to be unsatisfiable. If this happens at runtime, then the program
has reached an inconsistent state and an appropriate error is generated by the runtime
system.
The most interesting of the methods in the interface shown in Listing 6.29 is the
conditionallyExecute method. This method generates conditionally parallel imple-
mentations of the parallelism patterns recognized by the compiler.
During effect computation, an OwnershipEnv object is passed into the heap effect
computation methods to provide context information to the effect computation pro-
cess. One of the pieces of information the OwnershipEnv carries is all statically known
context relationships. This context information is used when determining the relation-
ships between two or more ownership contexts. If one or more of the relationships
required by the sufficient conditions is not known to hold in the OwnershipEnv then a
176 Chapter 6. Implementation
public class ConstraintList {// context relationship tests
public bool IsDominatedBy(Context dominee, Context dominator);public bool IsDominatedByOrEq(Context dominee, Context dominator);public bool IsDisjointFrom(Context context1, Context context2);public bool IsEqual(Context context1, Context context2);// add context relationships to the constraint list
public bool AddDomination(Context dominee, Context dominator);public bool AddDominationOrEquality(Context dominee, Context dominator);public bool AddEquality(Context context1, Context context2);public bool AddDisjoint(Context conext1, Context context2);// generate runtime context relationship tests
public Statement conditionallyExecute(Statement parallelVersion,Statement sequentialVersion);
}
Listing 6.29: The ConstraintList interface showing context relationship addition,testing, and runtime test generation methods.
ConstraintList object is constructed to hold the conditions which need to be tested
at runtime. The basic pattern for using the ConstraintList is as follows:
1. Test if the context relationship(s) called for by the sufficient conditions is known
to hold based on the current OwnershipEnv environment.
2. If the context relationship is unknown, add it to the ConstraintList.
3. If an exception is thrown when adding the relationship, the sufficient condition
cannot be satisfied
Once all context relationships for a sufficient condition have been met or added to the
ContextList, the conditionallyExecute method can be used to generate code to test
the context relationships at runtime to see if the relationships hold.
The conditionallyExecute method takes as parameters, two different statements:
the parallel and sequential versions of the code. The method takes all of the con-
text constraints stored in the ConstraintList and generates an if statement with a
condition testing to see if the stored relationships hold. If they hold at runtime, the
parallelVersion will be executed otherwise the sequentialVersion will be run. The
runtime context relationship tests are made at runtime using one of several different
extension methods which each test a specific context relationship. The calls to these
methods are emitted by the compiler as part of the condition on a conditionally paral-
lelized code block. Listing 6.30 shows these operators as implemented in the runtime
Chapter 6. Implementation 177
ownership library.
public static bool isDominatorOf(this object lhs, object rhs) {int lhsDepth = lhs.Depth(), rhsDepth = rhs.Depth();if (lhsDepth <= rhsDepth)
return false;object toCheck = rhs;for(int i = rhsDepth; i < lhsDepth; --i)
toCheck = toCheck.Owner();return toCheck == lhs;
}public static bool isDescendentOf(this object lhs, object rhs) {
int lhsDepth = lhs.Depth(), rhsDepth = rhs.Depth();if (lhsDepth >= rhsDepth)
return false;object toCheck = lhs;for(int i = lhsDepth; i < rhsDepth; --i)
toCheck = toCheck.Owner();return toCheck == rhs;
}public static bool isDisjointFrom(this object lhs, object rhs) {
int lhsDepth = lhs.Depth(), rhsDepth = rhs.Depth();int topDepth = lhsDepth, bottomDepth = rhsDepth;object top = lhs, bottom = rhs;if (lhsDepth < rhsDepth) {
top = rhs; bottom = lhs;topDepth = rhsDepth; bottomDepth = lhsDepth;
}object toCheck = bottom;for(int i = bottom; i < top; --i)
toCheck = toCheck.Owner();return toCheck != top;
}public static bool isSameAs(this object lhs, object rhs) {
return lhs == rhs;}
Listing 6.30: The extension methods in the runtime ownership library used to test therelationships between arbitrary contexts.
6.5.2 Implementation
Having discussed how the compiler tests the sufficient conditions for parallelization, it
remains to discuss when and how the compiler exposes inherent parallelism. Determin-
ing when to explicitly expose inherent parallelism to improve performance is difficult.
I assume there would be some oracle that could be consulted which would tell the
compiler when parallelization will result in a net increase in application performance.
Unfortunately, such an oracle is currently beyond the current state-of-the-art, but is
an important open research question beyond the scope of this thesis.
178 Chapter 6. Implementation
Since no parallelization oracle exists, the current compiler defaults to maximum paral-
lelization. As a result, the compiler attempts to parallelize all foreach and enhanced
foreach loops in a program. The programmer can manually review the results of
this parallelization and selectively revert loops to sequential operation where required.
The compiler has analysis and code transformation methods to recognize and expose
task parallelism, but at present these are disabled by default due to the lack of a
parallelization oracle. The parallelization of a code block is performed by invoking the
Parallelize method on the block. The method returns an AST for the parallel version
of the original AST sub-tree rooted at the node on which it was run.
The compiler currently uses the Microsoft Task Parallel Library (TPL) to implement
the task and data parallelism patterns [80]. The TPL provides constructs for data and
task parallelism and is designed to make it easy to write parallel programs, but it does
not provide any validity checking.
6.5.2.1 Data Parallelism
I have developed sufficient conditions for two loop parallelism patterns: data parallelism
and pipelining. The TPL provides direct support for the implementation of the data
parallelism pattern in the form of the Parallel.ForEach method. This method accepts
an enumerable data source and a method which is the loop body and which accepts a
single element from the enumerable data source as a parameter.
When the sufficient conditions for the parallelization of a sequential data parallel loop
are met, the loop body is made the body of the method passed to the Parallel.ForEach
method with the loop’s collection as shown in Listing 6.31. It is also used as part of
the supplied implementations of the ParallelEnhancedLoop methods as shown in List-
ing 6.32.
// Sequential Loop
foreach(Element elem in collection){ // loop body }
// Parallel Loop
Parallel.ForEach(collection, (Element elem) =>{ // loop body };
Listing 6.31: foreach loop parallelization using the TPL’s Parallel.ForEach method.
Chapter 6. Implementation 179
An example of a ParallelEnhancedLoop implementation using the TPLParallel.ForEach
loop is shown in Listing 6.32.
public static void ParallelEnhancedLoop(this IEnhancedEnumerable collection, Body<object, object> body) {Parallel.ForEach(collection.GetIndices(), (object index) => {
object tempElem = collection[index];body(ref tempElem, index);collection[index] = tempElem;
});}
Listing 6.32: An example of how a parallel enhanced foreach loop would be imple-mented.
In cases when the compiler cannot statically determine if the sufficient conditions for
loop parallelization are met, the compiler emits a conditionally parallel implementation
of the loop. An example of such a conditional implementation is shown in Listing 6.33.
if (// context relationship test ) {// Parallel Loop
Parallel.ForEach(collection, (Element elem) => {// loop body
}); }else {// Sequential Loop
foreach(Element elem in collection){// loop body
}
Listing 6.33: Conditional foreach loop parallelization using the TPL’sParallel.ForEach method.
6.5.2.2 Pipelining
Section 2.4.2 described loop pipelining as a technique to stage the execution of loop
iterations to extract a limited amount of parallelism when full data parallelism is not
possible due to loop carried dependencies. There are a number of existing algorithms
for scheduling loop bodies for pipelined execution. My effect system can be used to
compute the data dependency information required for these pipelining algorithms.
Once the loop has been scheduled for pipelined execution there are a number of different
techniques for implementing the pipeline stages.
At present, the Zal compiler uses a simple algorithm to detect the dependencies between
180 Chapter 6. Implementation
statements in the loop body. This information is then used to break the statements
in the loop body into stages. Having identified the pipeline stages, the compiler then
implements the pipeline using a custom library, I have written, which treats the pipeline
as a sequence of concurrently executing stages. Each stage consumes input from a buffer
or stream source and produces output to an output buffer. The buffers between stages
serve to ensure that pipeline stages do not need to block waiting for subsequent stages
to consume their next input values. The loop stages are connected to form a sequence,
based on the dependencies between stages, if any. This is not the only way to detect
and schedule a loop for pipeline execution, but it serves to demonstrate the practicality
of the approach.
The pipeline library supplies two publically accessible types: Pipeline and
StageAction<InType, OutType>. The StageAction<InType, OutType> is a delegate
which represents the action performed by a pipeline stage. The Pipeline class provides
static methods used to construct pipelines. The full source code for the pipelining
library is available from [32]. Listing 6.34 shows an example of creating a pipeline which
processes an image through several transformations. Each of the pipeline addition
stages accepts a StageAction delegate whose input type is the same as the output type
of the previously constructed stage and whose output type can be anything desired. The
final Run method starts the pipeline and blocks waiting for it to finish execution. The
pipeline output is accessible through the IEnumerable returned by the Run method.
// Zal loop
foreach(ImageSlice|o| s in img) {noiseFilter.reduce(s);contrastFilter.balance(s);edgeDetector.findEdges(s);s.setImgType();
}// pipelined version of Zal loop
Pipeline.AddFirstStage((ImageSlice|o| s) => noiseFilter.reduce(s), img).AddStage((ImageSlice|o| s) => contrastFilter.balance(s)).AddStage((ImageSlice|o| s) => edgeDetector.findEdges(s)).AddStage((ImageSlice|o| s) => s.setImageType()).Run();
Listing 6.34: An example of using the pipelining library to create a pipeline; each stageonly modifies the ImageSlice and the representation of the filter or detector in thatstage, if any.
Chapter 6. Implementation 181
6.5.2.3 Task Parallelism
The TPL provides a number of constructs which simplify the implementation of task
parallelism. The compiler makes use of the Task<TResult> class which represents a
future — an asynchronously executed task which produces a result, the reading of
which is treated as a synchronization point. The compiler does not automatically
employ these transformations as it does not currently have a means of determining
when it is beneficial to do so; however, it does have all of the code implemented to
generate task parallel implementations when necessary. Figure 6.35 shows an example
of how task parallelism is implemented using a TPL Task as a future.
// Zal code
int x = computeValue();...int y = x;// parallelized version
Task<int> x = Task<int>.Factory.StartNew(() => computeValue());...int y = x.Result;
Listing 6.35: The implementation of task parallelism using a TPL Task.
6.6 Summary
In this chapter I have presented how I refactored and extended the GPC] compiler
written by our research group. I also discussed how the infrastructure for generic type
parameters was abstracted and used to implement support for the context parameters.
The key point was creating interfaces for formal and actual type parameters and adding
classes to hold multiple lists of type parameters. This infrastructure greatly simplified
the task of adding context parameters to the language.
In addition to adding context parameters to the C] language, I added effect decla-
rations to the language and an effect computation pass in the compiler. The effects
computed by the compiler can then be used to reason about inherent parallelism as
was discussed in Chapter 3. The sufficient conditions applied by the compiler are the
same as previously discussed. The major implementation detail discussed was how the
runtime context relationship tests are generated to facilitate conditional parallelization.
182 Chapter 6. Implementation
The Zal compiler is a source-to-source compiler which implements runtime ownership
tracking along with Zal semantics in C]. I have discussed in detail these different
transformations, including the need for special handling of statics. I have also presented
the design details of runtime libraries I have written to simplify the implementation of
runtime ownership tracking and context relationship testing in C].
Chapter 7
Validation
In the preceding chapters, I have presented a system for reasoning about side-effects
and inherent parallelism in modern imperative object-oriented languages. The proposed
effect system is abstract and composable which facilitates reasoning about side-effects
in third-party libraries and other code in separate compilation units. In Chapter 5
I provided an argument, based on existing work, for the safety and soundness of the
language I have proposed. I also sketched proofs that the sufficient conditions I have
proposed preserve sequential consistency. It remains to demonstrate that my proposed
system works on realistic programs.
When trying to parallelize a sequential program, there are three main steps:
1. find the inherent parallelism in the program,
2. decide which inherent parallelism is worth exploiting, and
3. choose an implementation technology to expose the selected parallelism.
All three of these steps are difficult and there are a number of important open research
questions relating to each of them. This thesis is focused on step 1, how to find
inherent parallelism; I do not claim any contributions to step 2 or step 3. My system
will, therefore, try to identify as much inherent parallelism as possible, even if it is
not worth exploiting. The key point is that my system can identify available inherent
parallelism automatically.
183
184 Chapter 7. Validation
The overall goal of this chapter is to demonstrate how well my proposed system works.
I aim to demonstrate, through the application of my system to realistic examples, that
1. it finds the major sources of inherent parallelism in the examples,
2. the amount of program annotation required, in terms of lines of code and effort,
is reasonable, and
3. the runtime and memory overheads of the dynamic system are reasonable in
relation to the benefit the system provides.
When working with examples of parallelization, there is always a tendency to focus on
achieving an overall performance improvement for the system in question. While I do
present performance numbers to quantify overheads, speed-up is not a direct measure
of the success of my approach because, as previously stated, I am contributing only to
one of the three steps of the parallelization process.
The remainder of this chapter presents results obtained from four worked examples
which are representative of a broad range of programs employing a variety of different
types of parallelism. Rather than presenting each example in isolation, I will present
the examples collectively diving the discussion up into the different steps involved in
performing the validation. I begin by introducing the examples and then proceed
to: annotate the example programs to produce a Zal program, compile the annotated
programs, and, finally, determine if the major sources of parallelism have been identified
and measure the overheads of the runtime system.
7.1 Test Platform
The system used to generate the results I present in this chapter is an Intel Core i7
Q720M quad-core CPU running at 1.6GHz with 4GB of RAM and 2 320GB 7200RPM
SATAII hard-disks in a non-RAID configuration. The Core i7 processor features Intel’s
Hyper-Threading Technology and so presents 8 logical processing units to the oper-
ating system while having only 4 physical cores. The system runs Windows 7 64-bit
Professional Edition with the release versions of the .NET Framework 4 and Visual
Studio 2010.
Chapter 7. Validation 185
7.2 The Examples
This chapter focuses on four specific examples: a ray tracer, a calculator, an example of
the spectral methods parallelism dwarf, and a bank transaction processing system. In
this section, I introduce each of these examples in detail and provide code snippets of the
most interesting and relevant parts (from an ownerships and parallelism perspective)
of the examples. The complete source for these sample applications is available from
the MQUTeR website along with the Zal compiler [32].
Some readers may observe that these examples are traditionally regarded as programs
written by specialist programmers. While this is true, these programs are representative
of much broader classes of programs which are written by non-specialist programmers.
These examples, therefore, serve the purpose of demonstrating the operation of the
system when applied to real problem types.
7.2.1 Ray Tracer
The first example I have chosen is a ray tracing application taken from Microsoft’s
“Samples for Parallel Programming with the .NET Framework 4” [82]. This ray tracer
renders a simple animated scene of a ball bouncing up and down as shown in Figure 7.1.
Note that the scene includes reflective surfaces which increase the computational com-
plexity of the rendering.
While the ray tracer is embarrassingly parallel by design, discovering that parallelism in
a sequential implementation is not trivial due to the complexity of the methods needed
to perform the rendering. In this example, the main source of parallelism is the loop
which traces a ray of light for each pixel in the rendered scene. The Microsoft supplied
implementation has two versions of this rendering loop. The first uses two nested
loops to traverse the screen pixels sequentially computing the color for each pixel in
the resulting scene as shown in Listing 7.1. The second version uses the parallel for
loop from the .NET Framework 4 to parallelize the outer for loop from the sequential
implementation.
186 Chapter 7. Validation
Figure 7.1: The scene rendered by the ray tracer from Microsoft’s Samples for ParallelComputing with the .NET Framework 4; note the reflective surfaces which increase therendering complexity.
internal void Render(Scene scene, int[] scr) {Camera camera = scene.Camera;for (int y = 0; y < screenHeight; y++) {
int stride = y * screenWidth;for (int x = 0; x < screenWidth; x++) {
scr[x + stride] = TraceRay(new Ray(camera.Pos,GetPoint(x, y, camera)), scene, 0).ToInt32();
}}
}
Listing 7.1: The original C] Render method with its doubly-nested for loop.
7.2.2 Calculator
Tree traversal algorithms are found in a number of areas of computer science including
search, compilers, and databases. Many tree traversal algorithms, when written se-
quentially, contain at least some inherent parallelism and, in many cases, this inherent
parallelism takes the form of task parallelism. Because of the prevalence and utility
of tree data structures and the algorithms for their traversal, I have chosen a simple
calculator application to demonstrate the detection and exploitation of task parallelism
by my system.
The example calculator application reads in simple mathematical expressions written
using prefix notation and evaluates them. This is done by reading in an expression as
Chapter 7. Validation 187
a string, scanning the string to produce tokens, parsing the token stream to produce
an Abstract Syntax Tree (AST), and then using the AST to compute the value of the
expression. The computation of the value of the expression can be achieved by using a
post-order traversal of the AST.
The major source of inherent parallelism in this example is found in the traversal of
the calculator’s AST nodes with two or more children; the traversal of the children can
be carried out in parallel. An example of such a node in the calculator example is the
BinaryOperator AST node which represents binary arithmetic operators such as +
and −. Listing 7.2 shows the implementation of the abstract AST Node class as well
as the BinaryOperator node.
abstract class Node {public abstract int Compute();
}
class BinaryOperation : Node {char opType;Node left, right;
public BinaryOperation(char opType, CalculatorToken[] left,CalculatorToken[] right) {this.opType = opType;this.left = CalculatorParser.Parse(left);this.right = CalculatorParser.Parse(right);
}
public override int Compute() {int left = this.left.Compute();int right = this.right.Compute();
switch (opType) {case ’+’: return left + right;case ’-’: return left - right;case ’*’: return left * right;case ’/’: return left / right;case ’%’: return left % right;
}throw new InvalidOperationException();
}}
Listing 7.2: The C] implementations of the calculator AST Node class andBinaryOperator class.
188 Chapter 7. Validation
7.2.3 Bank Transaction System
There are often cases where full loop parallelism is not possible due to data depen-
dencies. It may be possible, in these cases, to extract some parallelism through the
use of pipelining to stagger the execution of the loop body. I wrote a simplified bank
transaction processing application as an example of this style of parallelism.
The simplified bank in this transaction system consists of a vault full of accounts, an
authentication system, and transactions. The bank has a method which accepts a list
of transactions which transfer money between two accounts. This method traverses
the list of transactions with a foreach loop applying the changes to the accounts in
the vault after validating the transaction. Listing 7.3 shows the Transaction structure
and the method which processes the transactions.
7.2.4 Spectral Methods
Fine-grained parallelism is another important form of parallelism and while my system
has not been specifically designed to detect and exploit traditional parallelism, there
are still a number of cases where it can do so. To demonstrate this, I now present an
example of an application which performs two 1-Dimensional Fast Fourier Transforms
(FFTs) on a matrix of values. This application is one of the examples provided as part
of the Parallelism Dwarfs project [120].
In this example, the main source of parallelism comes from the application’s FFT
computation loop which is shown in Listing 7.4. The Parallel Dwarfs project supplies
a number of different implementations of the same algorithm including a sequential
C] version and a C] version written using the .NET Framework 4’s Task Parallel Li-
brary [80].
Chapter 7. Validation 189
struct Transaction {AccountInfo src, dest;int amount;bool authenticated = false;bool validated = true;
public Transaction(string srcAccountCode, string srcAccountPIN,string destAccountCode, int amount) {
src = new AccountInfo(srcAccountCode, srcAccountPIN);dest = new AccountInfo(destAccountCode, null);this.amount = amount;
}
public void Authenticate(Authenticator auth) {authenticated = auth.Authenticate(src.AccountCode,
src.AccountPIN);}public void ValidateSource(Vault vault) {
validated = validated && Account.VerifyAccountCode(src.AccountCode);}public void ValidateDespt(Vault vault) {
validated = validated && Account.VerifyAccountCode(dest.AccountCode);}public void Apply(Vault vault) {
if (authenticated && validated)vault.Transfer(src.AccountCode, dest.AccountCode, amount);
}}
class Bank {...public void Apply(List<Transaction> transactions) {
foreach (Transaction transaction in transactions) {transaction.Authenticate(auth);transaction.ValidateSource(vault);transaction.ValidateDespt(vault);transaction.Apply(vault);
}}
}
Listing 7.3: The original C] implementation of the bank transaction system’sTransaction and the Bank’s transaction processing method.
190 Chapter 7. Validation
struct Complex { public double real, imag; ... }class Solver {
public int length;Complex[][] complexArray;public void Solve() {
// Transform the rows
foreach (Complex[] row in complexArray)row.FFT();
complexArray.transpose();
// Transform the columns
foreach (Complex[] row in complexArray)row.FFT();
complexArray.transpose();}
}public static class FFTHelpers {
public static void FFT(this Complex[] complexLine) {Complex[] W = new Complex[length];for (int i = 0; i < complexLine.Length; ++i) {
W[i].real = Math.Cos(-((2 * i * Math.PI) / length));W[i].imag = Math.Sin(-((2 * i * Math.PI) / length));...
}...
}}
Listing 7.4: A fragment of the sequential C] Spectral Method’s key data structures andcomputational methods.
Chapter 7. Validation 191
7.3 The Annotation Process
Having selected the C] applications to use as examples, the next step was to annotate
these programs with context parameters and effect declarations. There are a number
of possible approaches to annotating a program with ownership information. In this
section, I outline the heuristic process I used which is iterative in nature and is a process
of refinement. At present this is a manual process undertaken by the programmer. This
process could be as much work as manually parallelizing the program. However, this
process has two advantages: (1) it is a more mechanical process that requires less
specialized parallelism training and (2) this process could be easily adapted for use in a
compiler or integrated development environment (IDE) to relieve the programmer of at
least some of the annotation overhead. I use the code fragment shown in Listing 7.5 as
a running example throughout the following discussion to demonstrate the algorithm’s
operation.
public class Result {private String message;private Object value;public Result()
{ message = "Result Message"; }public void SetValue(Object value)
{ this.value = value; }}
public class ResultWrapper {public Result res;
}
Listing 7.5: The original sample code used as a running example to show the operationof my proposed ownership annotation algorithm heuristic.
The initial step for annotating a program is to find all of the classes which do not
reference other classes in the project. These classes gain a single context parameter
which is the owner of the class and the data it contains. In the running example
introduced in Listing 7.5, the Result class is an example of a class which does not
reference other classes in the project. It is initially annotated with a single owner
context parameter as shown in Listing 7.6.
A data flow analysis is performed for each field which holds an object reference to
determine if it holds a value generated by the class or a value passed in. If the field
holds only values generated by the class, its type is annotated with the current class’s
192 Chapter 7. Validation
public class Result[o] {private String message;private Object value;public Result()
{ message = "Result Message"; }public void SetValue(Object value)
{ this.value = value; }}
public class ResultWrapper {public Result res;
}
Listing 7.6: The first step of annotating the Result class, adding the owner.
this context as an owner. Otherwise, an additional context parameter is added to the
class declaration to be used as the field’s owning context. In the running example, the
value field of the Result class is annotated as being owned by an additional context
parameter since its value can be generated outside the class. The message field is
assigned to only in the Result constructor and so is owned by this (see Listing 7.7).
public class Result[o,v] {private String|this| message;private Object|v| value;public Result()
{ message = "Result Message"; }public void SetValue(Object|v| value)
{ this.value = value; }}
public class ResultWrapper {public Result res;
}
Listing 7.7: The completion of the Result class’s annotation.
Once all of these “low level” classes have been annotated with owners, annotation
proceeds with classes which reference only classes which have already been annotated.
Continuing the example, the ResultWrapper class is annotated with a single owner as
shown in Listing 7.8.
Next, the ResultWrapper fields are analyzed. The res field is publically accessible and
so its value can be generated outside the current class and so context parameters are
added to the ResultWrapper class as shown in Listing 7.9.
Once the preceding naıve annotation algorithm has been run, the annotated program
has a large number of context parameters. This large number of context parameters may
Chapter 7. Validation 193
public class Result[o,v] {private String|this| message;private Object|v| value;public Result()
{ message = "Result Message"; }public void SetValue(Object|v| value)
{ this.value = value; }}
public class ResultWrapper[o] {public Result res;
}
Listing 7.8: The first step of annotating the ResultWrapper class.
public class Result[o,v] {private String|this| message;private Object|v| value;public Result()
{ message = "Result Message"; }public void SetValue(Object|v| value)
{ this.value = value; }}
public class ResultWrapper[o,r,s] {public Result|r,s| res;
}
Listing 7.9: The final result of annotating the code shown in Listing 7.5.
be unwieldy and may not capture data relationships which the programmer knows to be
true. To reduce the number of context parameters and capture data relationships known
from the application design, each of the annotated classes is revisited and ownership
contexts merged where possible. For example, the o and v on the Result class in
Listing 7.9 might be merged if it was known that the value and Result objects are from
the same part of the program’s representation. The merging of context parameters is
a trade-off between flexibility and simplicity and so the choices made may need to be
revisited during the development of a system as it evolves.
Once the initial annotation addition and rationalization passes have been completed,
the program is compiled. The compiler output can be examined to determine where
parallelism has been found. If areas of computational intensity have not been paral-
lelized, the reasons for the parallelization failure can be determined. At present this
information is available only by manually debugging the compilation of the program,
but it would be possible to expose this information as part of the compiler’s output or
through a customized IDE. This information can assist the programmer in determining
194 Chapter 7. Validation
where code needs to be refactored or where context annotations have been rationalized
excessively causing the loss of too much information.
As discussed in Section 3.4.3, the relationships between the contexts read and written
by a code fragment must be known before it can be safely parallelized. In some cases,
the compiler may not be able to statically determine the relationship between context
parameters. In these cases it emits a runtime context relationship check. These checks
can be performed efficiently, as previously discussed in Section 3.4.4. However, if the
programmer knows the relationship should hold, a context constraint can be added to
capture this fact. These constraints reduce the flexibility of the code, but eliminate the
need for runtime context relationship checks. Whether context constraints should be
used is for the programmer to decide.
It is possible that the many of the annotations required by my system could be removed
or further simplified in future through the implementation of an ownership inference
system. The initial annotation process is highly procedural and so would be quite
amenable to automation. Note that the burden of program annotation is not unique
to my system, but is common to all ownership based systems and there is a significant
amount of research aimed at reducing this burden.
7.4 The Results of Annotation
I applied the annotation process described in the previous section to the four examples
introduced in Section 7.2. In this section I begin by presenting a summary of statistics
related to the annotation process. This is followed by a discussion of the annotation of
each of the examples individually.
When adding additional syntax to a programming language, the burden imposed on
the programmer needs to be minimized and the benefit derived from the additional
information maximized. To measure the syntactic overhead of the ownership and effect
annotations required I have calculated two statistics. The first, is a measure of the
total number of lines of code that had to be modified with context parameters or effect
declarations. Table 7.1 shows the total logical lines of code per example as well as the
number and percentage of lines modified.
Chapter 7. Validation 195
Logical Source Lines of Code
Example Total Modified Percentage (%)Ray Tracer 249 41 16.7%Calculator 161 43 26.7%Bank Transaction System 124 28 22.5%Spectral Methods 81 8 9.9%Total 615 120 19.5%
Table 7.1: Table showing the number of logical lines of source code modified for eachof the examples during the annotation process.
The second, is a measure of the total number of method definitions that had to be
modified with either context parameters, effect declarations, or both. Compared to the
context annotations, the effect annotations required on virtual methods can be more
onerous for the programmer to compute. Table 7.2 shows the number of method defini-
tions in each example as well as the number and percentage of those definitions which
had to be annotated. On average, roughly a quarter of method declarations required
some form of annotation, but the complexity of these annotations varied greatly. Only
methods declared to be virtual can be overridden and so require programmer effect
declarations; the side-effects of all other methods can be computed from the method
body. This means that only virtual methods and those requiring context parameters
require annotation.
Method Definitions
Example Total Modified Percentage (%)Ray Tracer 52 8 15.3%Calculator 16 11 68.8%Bank Transaction System 20 6 30%Spectral Methods 10 2 20%Total 98 27 27.6%
Table 7.2: Table showing the number method definitions modified for each of theexamples during the annotation process.
These statistics show that the annotation overhead of my proposed system is reason-
able since my system is a prototype with a number of avenues for further simplification.
Having presented the statistics, I now discuss the annotation of each example individ-
ually. I focus on the code that is the major source of parallelism in the examples as
previously discussed in Section 7.2.
196 Chapter 7. Validation
7.4.1 Ray Tracer
The entire ray tracing application was annotated with ownership information. The
parallelism found by the Zal compiler using my proposed sufficient conditions coincided
with the known sources of parallels in the application.
The main source of parallelism is the Render method. This method was annotated
with two context parameters to represent the owners of the scene and the array of
pixels. In addition to these annotations, I rewrote the doubly-nested for loop as a
single enhanced foreach loop to make it more clear that the loop iterates over the
elements of the scr array. Following the initial compilation of the method, the Render
method’s loop was parallelized conditional on the array’s owner, t, being disjoint from
this and s. After exploring the design of the program, I decided to add a static context
constraint to the Render method capturing this context relationship. Listing 7.1 shows
the end result of the annotation and loop rewriting operations.
internal void Render[s,t](Scene|s| Scene, int[]|t| scr)reads<this,s,t> writes<t> where t # this,s {Camera|s| camera = scene.Camera;foreach(ref int pixel at int Index in scr) {
pixel = TraceRay|s|(new Ray(camera.Pos,GetPoint|s|(Index % screenWidth, Index / screenWidth, camera)),scene, 0).ToInt32();
}}
Listing 7.10: The Zal implementation of the original Render method shown in List-ing 7.1; note the loop has been rewritten as an enhanced foreach loop.
7.4.2 Calculator
The annotation of the Node and BinaryOperation classes in the calculator example
was undertaken following the algorithm described in Section 7.3. I added an owner
to the Node class and to the BinaryOperator classes. It was also necessary to add an
additional context parameter to the BinaryOperator class to represent the owner of the
tokens used to build the object. After the initial compilation of the BinaryOperator,
I examined the calls to compute the values of the left and right expressions as part
of the Compute method. The compiler indicated that the two calls were not able to be
run in parallel because they shared the same owner. I, therefore, decided to introduce
Chapter 7. Validation 197
two subcontexts to hold the subexpressions and allow task parallelism to be exploited.
Listing 7.11 shows the final annotated version of the Compute method.
abstract class Node[owner] {public abstract int Compute() reads <this> writes <>;
}
class BinaryOperation[owner,tokOwner] : Node|owner| {subcontext sub left, sub right;char opType;Node|sub left| left;Node|sub right| right;
public BinaryOperation(char opType, CalculatorToken[]|tokOwner| left,CalculatorToken[]|tokOwner| right)reads <tokOwner> writes <tokOwner> {this.opType = opType;this.left = CalculatorParser.Parse|sub left,tokOwner|(left);this.right = CalculatorParser.Parse|sub right,tokOwner|(right);
}
public override int Compute() reads <this> writes <> {int left = this.left.Compute();int right = this.right.Compute();
switch (opType) {case ’+’: return left + right;case ’-’: return left - right;case ’*’: return left * right;case ’/’: return left / right;case ’%’: return left % right;
}throw new InvalidOperationException();
}}
Listing 7.11: The Zal implementation of the calculator AST Node class andBinaryOperator class.
7.4.3 Bank Transaction System
In the Bank Transaction System, I focused on the Bank’s Apply method because it is
the main source of parallelism in the application. I annotated this method by adding
a context parameter to the method, to represent the owner of the transaction list
supplied as a parameter. After my initial compilation of the method, the foreach
loop was implemented sequentially. Additional development revealed the need to have
separate owners of the Vault and Authenticator references and so I added the sub v
and sub a subcontexts. The overall result of these annotations is shown in Listing 7.12.
198 Chapter 7. Validation
class Bank[owner] {subcontext sub v, sub a;...public void Apply[listOwner](List<Transaction>|listOwner| transactions)
reads<this,listOwner> writes<this> {foreach (Transaction transaction in transactions) {
transaction.Authenticate|sub a|(auth);transaction.ValidateSource|sub v|(vault);transaction.ValidateDespt|sub v|(vault);transaction.Apply|sub v|(vault);
}}
}
Listing 7.12: The Zal implementation of the Bank’s transaction processing method.
7.4.4 Spectral Methods
The main source of parallelism in the spectral methods example is the loop which applies
a Fast Fourier Transform to each of the rows in the complexArray. Listing 7.13 shows
a snippet of the example annotated with context parameters. The two dimensional
complexArray has an owner for each of the two dimensions; one owner for the array
containing arrays and one for the arrays containing values. The data contained in the
array is part of the solver representation, but to separate the array’s structure from the
array’s data, I needed to add two subcontexts to the Solver as shown in the listing.
7.4.5 Summary
Overall, approximately a quarter of the sample programs required some form of an-
notation with the system I currently propose. This burden could be further reduced
by the addition of even a basic ownership inference system and more helpful defaults.
Given that much of the information contained in these annotations is already part of
the overall program design, I have shown that the safe automatic parallelization of
sequential programs is possible using my system; it remains to refine the techniques I
have proposed.
Chapter 7. Validation 199
struct Complex { public double real, imag; ... }class Solver[owner] {
subcontext rowContext, colContext;public int length;Complex[]|rowContext|[]|colContext| complexArray;public void Solve() reads <this> writes <this> {
// Transform the rows
foreach (Complex[]|rowContext| row in complexArray)row.FFT|colContext|();
complexArray.transpose();
// Transform the columns
foreach (Complex[]|rowContext| row in complexArray)row.FFT|colContext|();
complexArray.transpose();}
}public static class FFTHelpers {
public static void FFT[o](this Complex[]|o| complexLine)reads <this,o> writes <this>{Complex[]|o| W = new Complex[length]|o|;for (int i = 0; i < complexLine.Length; ++i) {
W[i].real = Math.Cos(-((2 * i * Math.PI) / length));W[i].imag = Math.Sin(-((2 * i * Math.PI) / length));...
}...
}}
Listing 7.13: The annotated version of the Spectral Methods example.
7.5 The Results of Compilation
The Zal compiler emits C] code which adds runtime ownership tracking to the generated
classes and which exposes exploitable inherent parallelism using Microsoft’s Task Par-
allel Library (TPL) [80]. In this section I present the results of compiling the programs
annotated in the previous section. There are two main goals for this presentation: (1)
the major sources of inherent parallelism in the examples are explicitly exposed in the
compiler implementation and (2) provide the reader with a feel for how Zal programs
are implemented using the TPL in C] (see Chapter 6).
7.5.1 Ray Tracer
The annotated ray tracer program shown in Listing 7.1 was compiled to C] using the
Zal compiler. When the Zal compiler generates C] source code, it stores formal context
200 Chapter 7. Validation
parameters and effect information in custom C] attributes. The C] compiler, in turn,
stores these attributes in the executable code it produces. The reason for storing this
information is so that if a Zal program later links against the program, the program’s
ownership and effect information is available. This facilitates both type checking and
effect computation.
Listing 7.14 shows the C] generated from the Zal implementation of the Render method.
Notice that the compiler has transformed the enhanced foreach loop into a parallel
version implemented as an extension method as discussed in 6.4.2.
[ReadEffect("this", "s", "t"), WriteEffect("t"),FormalContextParameters("s", "t")]internal void Render(Scene Scene, int[] scr, IOwnership Context s,
IOwnership Context t) {Camera|s| camera = scene.Camera;scr.ParallelLoop((ref int pixel, int Index) =>
{pixel = TraceRay(new Ray(camera.Pos, GetPoint(
Index % screenWidth, Index / screenWidth,camera, Context s)),scene, 0, Context s).ToInt32();
});}
Listing 7.14: The C] implementation of the Zal Render method shown in Listing 7.1.
7.5.2 Calculator
Listing 7.15 shows the output of the Zal source-to-source compiler. Note the task par-
allelism that has been explicitly exposed by a source-to-source transformation applied
by the compiler to the computation of the values of the left and right subexpressions
of the BinaryOperation. Also note the subcontext fields in the emitted class.
7.5.3 Bank Transaction System
The main source of parallelism in the bank transaction system can be found in the
processing of transactions. The loop which processes transactions is not amenable to
full parallelization due to writes to shared mutable state; however, the loop is suitable
for pipelining. Listing 7.16 shows how the loop of interest is parallelized by the Zal
compiler. Notice that the loop has been transformed into a multi-stage pipeline.
Chapter 7. Validation 201
[FormalContextParameters("owner","tokOwner")]class BinaryOperation : Node,IOwnership {
private int depth;private object owner;
[SubcontextField]private IOwnership Context sub right;[SubcontextField]private IOwnership Context sub left;char opType;Node left, right;[ReadEffect("this"),WriteEffect()]public override int Compute() {
int left = 0;int right = 0;Task[] tasks = new Task[]{
Task.Factory.StartNew(() => {left = this.left.Compute(); }),Task.Factory.StartNew(() => {right = this.right.Compute(); })
};Task.WaitAll(tasks);switch (opType) {
case ’+’: return left + right;case ’-’: return left - right;case ’*’: return left * right;case ’/’: return left / right;case ’%’: return left % right;
}throw new InvalidOperationException();
}...
}
Listing 7.15: The implementation of part of the Zal calculator example in C]; note thetask parallelism in the Compute method.
7.5.4 Spectral Methods
Finally, I conclude by presenting the results of compiling the spectral methods example.
The compiler successfully detects that the two foreach loops applying the FFT row-
wise to the matrix can be executed in parallel and transforms the loops appropriately.
Note that the two dimensions of the complexArray in the Solver method are given
different owners; without this the sufficient conditions for parallelism could not be met
and the compiler conservatively concludes that the loops must run sequentially. The
compiler output for the fragment of the example previously presented in Listings 7.4
and 7.13 is shown in Listing 7.17.
202 Chapter 7. Validation
[FormalContextParameters("owner")]class Bank : IOwnership{
...[ReadEffect("this", "listOwner"), WriteEffect("this"),FormalContextParameters("listOwner")]public void Apply(List<Transaction> transactions,
IOwnership Context listOwner) {Parallel.Pipeline.AddFirstStage
((Transaction transaction) => {transaction.Authenticate(auth, Context sub a);return transaction;
}, transactions).AddStage((Transaction transaction) => {
transaction.ValidateSource(vault, Context sub v);return transaction;
}).AddStage((Transaction transaction) => {
transaction.ValidateDest(vault, Context sub v);return transaction;
}).AddStage((Transaction transaction) => {
transaction.Apply(vault, Context sub v);return transaction;
}).Run<Transaction>();}
}
Listing 7.16: The implementation of the Zal transaction processing method in C]; notethe pipelined foreach loop.
7.6 Performance
Having selected examples representative of a broad spectrum of different inherent par-
allelism patterns, I have demonstrated that my proposed system can detect and exploit
the major sources of parallelism contained in these examples. It now remains to mea-
sure the runtime and memory overheads involved in exploiting the parallelism detected
which is the focus of this section.
It is important to note that the runtime ownership tracking systems have not been
fully optimized. The most obvious opportunities for performance improvement have
been taken, but there are still a number of areas where additional performance may
yet be extracted. It is also important to remember that runtime ownership tracking
and context relationship testing is not required. In the absence of a runtime ownership
tracking system, any runtime relationship tests made by programs will fail which will
result in the program relying solely on static reasoning.
Chapter 7. Validation 203
struct Complex { public double real, imag; ... }[FormalContextParameters("owner")]class Solver {
public int length;Complex[][] complexArray;private IOwnership Context colContext;private IOwnership Context rowContext;public Solver() : base() {
Context colContext = new SubContext(this);Context rowContext = new SubContext(this);
}[ReadEffects("this"), WriteEffects("this")]public void Solve() {
// Transform the rows
Parallel.ForEach(complexArray, ()=> row.FFT( Context colContext));
complexArray.transpose();
// Transform the columns
Parallel.ForEach(complexArray, ()=> row.FFT( Context colContext));
complexArray.transpose();}
}public static class FFTHelpers {
[ReadEffects("this", "o"), WriteEffects("this")]public static void FFT(this Complex[] complexLine,
IOwnership Context o) {Complex[] W = new Complex[length];Context o.AddChild(W);
for (int i = 0; i < complexLine.Length; ++i) {W[i].real = Math.Cos(-((2 * i * Math.PI) / length));W[i].imag = Math.Sin(-((2 * i * Math.PI) / length));...
}...
}}
Listing 7.17: The compiler output for the Zal implementation of the Spectral Methodsexample.
As has been stated earlier in this chapter, the goal of these measurements is not to
measure an overall performance improvement from all of these examples since not all
of the detected parallelism is necessarily worth exploiting. The goal of this thesis is
to identify parallelism; determining when to exploit it remains an important open but
separate research problem.
204 Chapter 7. Validation
7.6.1 Runtime Overhead
In this section, I present measurements of the overhead of my proposed runtime own-
ership tracking system which can be used to supplement the statically known context
relationships. I have compiled all four annotated examples using the Zal compiler to
produce two different implementations: one using the simple parent pointer algorithm
and one using Dijkstra Views. Figures 7.2, 7.3, 7.4, and 7.5 show the speed up graphs
for the ray tracer, calculator, bank transaction system, and spectral methods examples
respectively. Each of these speed up graphs shows an ideal speed up line which shows
how a perfectly parallel algorithm with no sequential components. These graphs show
the logical number of cores since the testing was performed on a hyper-threaded Core
i7 system and in the ideal case there should be no observable difference between the
number of logical and physical computational cores. The graphs also feature a per-
formance measurement of a version of each example that has been hand parallelized
using Microsoft’s TPL technology. This provides a reference point for determining the
overhead introduced by the runtime ownership tracking systems I have implemented.
The bank transaction system and calculator examples do not benefit from paralleliza-
tion overall, but the key point is that the performance of the ownership tracking system
has only a very small impact on the program execution time in these cases. To make
the overheads of the different systems more clear, Figures 7.6, 7.7, 7.8, and 7.9 show the
overheads as percentages of the hand parallelized execution time. Note that the differ-
ence is due solely to the overhead of the runtime ownership tracking system because
both the hand parallelized and automatically parallelized versions of the applications
used the same Task Parallel Library constructs to effect the parallelization.
Overall, the worst overhead measured for the pointer-based runtime ownership system
was 10% with many cases where the overhead of the pointer-based system was on
the same order of size as the noise in the sample times which were obtained from an
arithmetic mean of 30 repetitions of the measurement after the 3 fastest and 3 slowest
measurements were removed. The runtime overhead of the Dijkstra Views based system
was approximately 20% in the worst case, though it did perform very well when applied
to the calculator example. From the results, the pointer based ownership tracking
system was the better of the two tracking systems for all of the examples studied. The
Chapter 7. Validation 205
Dijkstra Views system would likely be better in cases where the ownership trees are
tall and the number of object relationship tests high.
7.6.2 Memory Overhead
Having discussed the runtime overhead of my proposed ownership tracking system,
it remains to explore the memory overhead incurred by these techniques. The ray
tracer and spectral methods examples are computationally bound problems which do
not create a significant number of objects and so are not used as part of this section
focused on measuring the memory overhead of each of the runtime ownership tracking
systems.
Figures 7.10 and 7.11 show the relative memory overheads for the two different own-
ership tracking systems in the calculator and bank transaction systems respectively.
These overhead measurements are made relative to the hand coded version using the
TPL. In the calculator example, the objects allocated in the program form a binary
tree. As the number of nodes increases so does the height of the binary tree and the
ownership hierarchy. The increase in the ownership tree’s height causes a proportional
increase in the amount of memory consumed by the Dijkstra View’s ownership track-
ing implementation as expected. Overall the memory overhead of the pointer based
system was approximately 15% on average with the Dijkstra Views implementation
adding approximately a 30% memory overhead. These numbers could be further re-
duced with additional optimization including CLR and JIT support for the runtime
ownership tracking system. The pointer based system’s memory overhead could be
reduced by not caching an object’s depth in the ownership tree in the object, but this
would come at the cost of reduced runtime performance.
206 Chapter 7. Validation
7.7 Summary
In this chapter I have presented several worked examples covering a variety of different
types of exploitable inherent parallelism. I have demonstrated that the major sources
of parallelism in these examples can be detected and exposed automatically when the
program is appropriately annotated with my type and effect system. On average, ap-
proximately 25% of the sample program source code required annotation with context
parameters or effect declarations. The small examples, the calculator and bank transac-
tion systems, had a higher percentage of lines that required annotation than the larger
more complex examples. The higher annotation percentages for the smaller examples
probably represent the worst case annotation overhead since larger methods amortize
the annotation cost across more lines of code.
With the use of my proposed system, all of the major sources of parallelism were found
which produced solutions which scaled in the same way as the hand parallelized versions
did. The runtime overhead of ownership tracking was between 10% and 20% in the
worst case with memory overhead in the region of 15% to 30% for the pointer chasing
ownership tracking system. Overall in this chapter, I have successfully implemented
a proof of concept system which has demonstrated that my proposed system is not
prohibitively expensive.
Chapter 7. Validation 207
Figure 7.2: Speed-up graph for the ray tracer example.
Figure 7.3: Speed-up graph for the calculator example.
208 Chapter 7. Validation
Figure 7.4: Speed-up graph for the bank transaction processing system.
Figure 7.5: Speed-up graph for the spectral methods example.
Chapter 7. Validation 209
Figure 7.6: Graph showing the overhead of the two possible runtime ownership trackingsystems when applied to the ray tracer example.
Figure 7.7: Graph showing the speedup in the calculator example when the runtimeownership tracking systems are enabled.
210 Chapter 7. Validation
Figure 7.8: Graph showing the overhead of the two possible runtime ownership trackingsystems when applied to the simplified bank transaction system.
Figure 7.9: Graph showing the overhead of the two possible runtime ownership trackingsystems when applied to the spectral methods example.
Chapter 7. Validation 211
Figure 7.10: Graph showing the memory overhead of the pointer and Dijkstra Viewsbased runtime ownership tracking systems. The O(n) memory usage of the DijkstraViews implementation can be clearly seen as the number of nodes increases.
Figure 7.11: Graph showing the memory overhead of the pointer and Dijkstra Viewsbased runtime ownership tracking systems. Note that the height of the ownership treedoes not increase with the data size in this example and so the Dijkstra Views memoryconsumption grows proportional to the number of transactions.
212
Chapter 8
Comparison with Related Work
Reasoning about data dependencies, side-effects of evaluation and the presence of in-
herent parallelism have long been active areas of research in the field of computer
science. In this thesis I propose a system for reasoning about side-effects of evaluation
in an abstract and composable manner. I apply this system to the problem of finding
inherent parallelism in sequential programs. There is, therefore, a considerable body
of existing literature directly related to this work. In this chapter I explain how my
system compares and contrast with other related systems documented in the literature
and how my system addresses issues not addressed by others. Traditionally, related
work chapters are presented early in a thesis, but I chose to defer this discussion so
that my comparisons could involve the full technical details of the systems concerned
which would not have been possible at the start of this thesis.
Current approaches to exploiting parallelism in programs can be classified into two
broad groups: speculative approaches and non-speculative approaches. The speculative
approaches execute portions of the program ahead of time based on predicted control
flow information. During speculative execution, these systems monitor the changes in
the program’s state. If there is a conflict between the main program and the speculative
execution, then the system reverts the changes made by the speculative execution to
return the program to a consistent state. Even with these speculative approaches,
the programmer must still define which operations need to be run together and which
can be broken apart. Further, the programmer may need to supply some or all of
the rollback operation in addition to the program itself. The current major examplar
213
214 Chapter 8. Comparison with Related Work
of this style of parllelization is Software Transactional Memory (STM). Speculative
approaches have the benefit of not requiring programmers to modify their programs,
but suffer from high overheads when conflicts are frequent or when few processors are
available to speculatively execute program fragments.
By comparison, the non-speculative approaches focus on determining the dependencies
present in a program. This information is used to avoid parallelizing code when doing
so could cause dependencies to be violated or adding synchronization to prevent the
dependency violation. Both of these approaches have their pros and cons. In this thesis,
my work has focused on a priori reasoning about the presence of inherent parallelism
and it is most closely related to the non-speculative approaches that are the focus of
this chapter.
The existing non-speculative work most directly related to this thesis can be broadly
classified into six major areas: type systems, logics, traditional data dependency and
may-alias analyses, parallel programming languages and APIs, alternative concurrency
abstractions, and object-oriented paradigm considerations. None of these systems com-
bine abstraction and composition, with parallelization and correctness checking to pro-
duce a framework which helps both programmers and automated tools to reason about
inherent parallelism. This thesis draws on ideas from all of these areas. In this chapter,
I aim to discuss the literature related to each of these different areas with a view to
placing my work into context with existing work in the literature and arguing for the
uniqueness of its contributions.
8.1 Background to Type Systems and Data Flow Analysis
Type systems are one of the key tools programmers use to reason about the behavior of
programs. The goal of a type system is to require programs to respect a set of predefined
invariants. A program is said to be well-typed when it respects the invariants required
by the language it is written in. Type systems come in a number of different styles
and strengths, from those which provide only limited invariant enforcement at runtime
to those which provided rigid and complex static invariant enforcement; I refer to
weak dynamically typed languages and strong statically typed languages respectively.
Chapter 8. Comparison with Related Work 215
Actual type systems form a spectrum between these extremes. The stronger the type
system, the more invariants it can enforce and the more complex those invariants can
be. Unfortunately, this extra information about the behavior of programs, which can be
exploited by both programmers and automated tools, does not come for free. The more
rigid and complex the type system, the greater the effort required by the programmer
to annotate constructs with type information as well as create constructs designed just
to keep the type checker happy.
Programmers have long debated the merits of strong typing versus weak typing and the
merits of static typing versus dynamic typing. Opinions on the best choice tend to vary
with the dominant type systems of the day. When strong statically typed languages
are dominant, many programmers find the limited amount of type information required
when writing a program in a weakly typed dynamic language attractive. The rise in
popularity of dynamic languages such as Ruby, Python, and JavaScript over the last 10
years is evidence of this increased support [119]. As people switch to using weaker type
systems, they become aware that these systems cannot provide the same guarantees as
strong static type systems and, over time, interest flows back to strong static typing
for this reason. The efforts to statically check programs written in languages like
Ruby [43] and JavaScript [118, 62] indicate a need for more checking and validation
than currently provided by these dynamic languages. There is, therefore, a cost benefit
analysis which generally takes place when designing a language and its associated type
system [107]. A good language allows important properties to be reasoned about with
minimal annotation effort (both syntactic and mental) [107].
Type systems have not, traditionally, been used to reason about data dependencies and
parallelism. As was just discussed, type systems aim to enforce invariants across entire
programs. Data-flow analysis, by contrast, can be thought of as a set of techniques
used to discover invariants in a program. Program invariants can be used for a number
of purposes including detecting inherent parallelism and program optimization oppor-
tunities. Laud, Uustalu and Vene have shown that type systems are equivalent to data
flow analyses for imperative programming languages [71]. Some invariants may be most
easily enforced using a type system while others may be more amenable to discovery
using data-flow analysis. Often, the two techniques can be used to complement each
216 Chapter 8. Comparison with Related Work
other. For example, when trying to determine the range of possible values a variable
may have, the type system may enforce the range of possible values is limited to a
particular domain, such as the natural numbers, while data-flow analysis may provide
a more precise range of values, such as natural numbers between 0 and 100.
Traditionally, the parallelism community has used data-flow analysis to detect data de-
pendencies and compute side-effects. Programs written in modern imperative object-
oriented languages tend to have more complex aliasing and data flow patterns compared
to programs written in more traditional imperative programming languages. Unfortu-
nately, these additional complexities complicate the application of traditional data-flow
analysis techniques to programs written in these languages. For example, a traditional
context sensitive or context insensitive may-alias analysis is not well suited to use in
an object-oriented program; a superior solution is possible by modifying the approach
to suit the language [86].
The purpose of data-flow analysis in the parallelization process is essentially to discover
invariants which can be used to compute data dependencies between different parts of
the program. Unfortunately, modern imperative programming languages do not have
features to help facilitate the discovery of these program invariants. In this thesis I
have proposed language features which help to facilitate the discovery of these invari-
ants while minimizing the syntactic and semantic burden on the programmer. These
language features I have developed may not provide the same precision as traditional
dependency techniques, but they are more abstract and composable which helps to
facilitate reasoning in large programs.
One of the fundamental principles of object-oriented programming is encapsulation;
an object’s internal state should be protected from external access and modification.
The use of encapsulation provides a number of benefits including simplified design,
debugging, maintenance, and reuse. Popular object-oriented programming languages,
like Java and C], do not provide language features for strong encapsulation enforcement;
they provide only limited name protection. Type systems for encapsulation enforcement
and tracking have been heavily studied by the verification community. These systems
can form a basis for building a hierarchical effect abstraction system which can be used
to help simplify a number of different data-flow analyses.
Chapter 8. Comparison with Related Work 217
8.2 Traditional Data Dependency Analysis
The traditional approach to performing a data dependency analysis is to compare,
pairwise, all of the statements in the code fragment being analyzed to determine the
nature of the dependency between the two statements, if any. The data dependency
analysis itself operates on the level of individual variables. These analysis techniques
operate on value types checking for variable name equality to detect if a dependence
can exist or not. When reference types are encountered, a may-alias analysis needs
to be performed to determine which variables might actually be referring to the same
object at runtime.
My approach, in contrast with traditional data dependence analysis, uses methods as
a fundamental unit of effect abstractions. Dynamic binding and overriding can cause
significant problems for traditional data dependency analysis techniques because the
implementation of the method which will be run at runtime needs to be determined.
My approach avoids this problem by enforcing effect consistency during overriding and
so declared effect signatures can be trusted to describe the maximum possible effects
of a method even if that method is later overridden in another class.
8.2.1 Array Dependence Analysis
A number of different techniques have been proposed to try to answer questions about
the relationships between different array index expressions. These techniques operate
only on affine indexing expressions. One of the simplest approaches, proposed by
Banerjee, uses the GCD to determine if the two array index expressions could be
equal [11]. Banerjee requires the loop to be normalized to iterate from 1 to a terminal
value incrementing by 1. Once the loop is in this form, the array index expressions
are arranged into the form a * i + b and c * i + d. Banerjee proved that if a loop
carried dependence exists then GCD(c, a) must divide (d − b).
Techniques for detecting when array index expressions could lead to dependencies
through arrays continued to evolve during the 1990s. In 1991, it was proved that
solving a system of constraint equations for array index expression to find out if they
could cause a data dependency on the array is an NP -complete problem [75]. One
218 Chapter 8. Comparison with Related Work
of the most advanced techniques which allows precise solutions to systems of affine
array index equations to detect dependencies was proposed by Pugh in the form of
the Omega Test [98, 97]. The Omega Test was ultimately developed to show that
for systems of affine equations, the data dependence analysis could be performed in a
reasonable amount of time [68].
The approach I have proposed, which operates on collections and iterators rather than
arrays avoids the affine constraint restrictions common to traditional array data de-
pendence techniques. As with many of these traditional techniques, my system still
requires the elements of the collection being traversed to be unique which is not an
easily discovered or enforced program invariant. Traditional dependence analysis tech-
niques may be able to find some opportunities for parallelism which cannot be found
using my system because of the abstraction of effects in terms of contexts. What is
gained, however, is a greatly simplified ability to reason about data parallel loops and to
handle more general purpose loops than those traditionally focused on when considering
array data dependence analysis. Finally, my techniques describe effects in an abstract
and composable manner which can be used to prevent the explosion in the number of
comparisons that are required that occurs with the traditional pairwise consideration
of loop body statements.
8.2.2 May-Alias Analysis
A may-alias analysis is a static analysis and so it does not have access to actual pointers
and objects. Instead of tracking objects and memory addresses, may-alias analyses try
to disambiguate variables using allocation site information. An allocation site is the
method responsible for allocating an object. Therefore, if two objects are created at
the same allocation site, a traditional may-alias analysis would identify that the two
objects could be the same. This is an approximation of the program’s actual behavior.
One important technique used to improve the precision of may-alias analysis is context
sensitivity. In context insensitive analyses, each method body is analyzed in isolation
with no information about the parameters supplied to the method [86]. Context sensi-
tive analyses are far more precise than context insensitive analyses. each method body
is analyzed once for each site it is called from and the information from the calling
Chapter 8. Comparison with Related Work 219
context is used to help make the may-alias analysis results more precise [86].
Milanova et al. studied the application of Andersen’s C may-alias analysis, a traditional
context sensitive analysis, to the Java programming language [86]. They found that
the analysis does not take into account the target of a method’s invocation. Because of
this, a method which modified one object’s state appeared to be manipulating the state
of all objects of the same type [86]. Milanova et al. proposed a new object sensitive
analysis which distinguished methods invoked on different allocation sites which allowed
the updating of different object states to be distinguished from each other [86]. This
object sensitive may-alias analysis has subsequently been refined and improved by a
number of other authors, most recently Bravenboer and Smaragdakis [22].
Unfortunately, even with the development of new object sensitive analyses, these tech-
niques are still only approximations of the actual behavior of the program. The main
problem is that the languages to which these analyses are applied do not provide any
features to help distinguish aliases or track the flow of data through a program. This
thesis contributes an approach which shows how an object-oriented language can be
modified to help provide this tracking and how this can, in turn, be used to reason
about the behavior of a program. My approach may not work as well for some specific
scientific applications, but it is better able to handle programs written in modern im-
perative object-oriented languages. With further study, a combination of my approach
and more traditional approaches may allow programmers to get the best of both worlds.
8.3 Automatically Parallelizing Compilers
Creating compilers to automatically parallelize sequential programs has long been a
goal of computer science research [15]. A lot of research was done in this area in the
1980s and 1990s with a primary focus on compilers for the Fortran77 programming lan-
guage [51, 17, 11, 10]. This work has resulted in a number of commercial automatically
parallelizing Fortran compilers like the Intel Fortran Compiler and IBM XL Fortran to
name but a few.
The Fortran77 language is a relatively simple language from a program analysis per-
spective [107]. It does not employ any of the language features which make analyzing
220 Chapter 8. Comparison with Related Work
programs in more modern languages, Java and C] so difficult including: pointers, ref-
erence types, dynamic binding, and dynamic linking. All of these modern language
features provide avenues for aliasing and Fortran is easier to analyze largely because of
this.
There was also work done on creating parallelizing compilers for C [70] and later Java [8].
The added complexity of these languages meant that the compilers were often unable
to determine if a loop could be safely parallelized. To work around this problem, many
compilers allowed programmers to provide hints to guide the parallelization of their
programs (for example High Performance Fortran for distributed memory systems [54]).
Others built tools where programmers could interact with the compiler to help decide
where parallelism could be safely employed (for example SUIF Explorer [72] and the
Polaris compiler [17]). The hints and parallelization decisions taken by programmers in
these systems goes largely unverified which can allow many subtle bugs to be introduced
into programs through the parallelization process.
My system provides an alternative form of program annotation from the unverified an-
notations employed in previous systems. The annotations employed in Zal are verified
by the compiler which prevents programmer parallelization annotations from introduc-
ing subtle bugs in to programs. Zal is also better able to handle the complex language
features and aliasing patterns found in modern languages than previous automatic
parallelizing compilers.
8.4 Type and Effect Systems
The goal of this section is to provide an overview of type and effect systems for ab-
stracting the structure of a program’s data in a hierarchical and composable manner.
In addition, I compare and contrast the approach I have taken in this thesis to these
systems. It is important to note that the application of these type and effect systems
to the problem of undertaking full data-flow and data dependency analysis has not, to
the best of the author’s knowledge, been attempted previously.
One of the early object-based effect systems was proposed by Greenhouse and Boyland
Chapter 8. Comparison with Related Work 221
who developed a memory region-based effect system which abstracted effects using pro-
gramer defined memory regions [49]. Effects in terms of programmer declared memory
regions do not abstract and compose well due to the lack of a rigorous relationship
structure underlying them.
8.4.1 FX
FX is a programming language developed in the 1980s [45] which is a member of the
Scheme-like family of functional programming languages. It posses both purely func-
tional operations as well as operations that permit the modification of shared mutable
state [45]. The most notable feature of FX was the addition of explicit effect annotations
to the language. Programmers were required to specify the side-effects of evaluating
a function on its signature [45]. The annotations were not verified by the compiler,
but were used by the compiler to determine which pieces of code could be executed in
parallel [45]. The language syntax was greatly complicated by the side-effect annota-
tions. Further, the lack of compiler verification of the methods’ declared side-effects
could result in obscure, hard to reproduce bugs being caused by incorrect or omitted
annotations; the errors produced would provide no hint that the annotations were the
cause of the problem. While the effect annotations introduced in FX were useful for
parallelization, the subtle bugs the annotations could cause demonstrate the impor-
tance of inference, defaults, and compiler verification to ensure that the annotations
are minimally burdensome on the programmer and that they are correct.
8.4.2 Ownership Types
With the popularization of object-oriented programming in the 1990s, software veri-
fication researchers became interested in trying to detect and enforce encapsulation.
Some of the earliest type systems to enforce encapsulation were proposed by Almedia
(Balloon Types [5]) and Hogg (Islands [57]). These systems could enforce only weak
encapsulation invariants. Around the same time as the work of Greenhouse and Bouy-
land, Ownership Types were proposed as an extension and further refinement of these
early encapsulations systems. Ownership Types provided much stronger encapsula-
tion enforcement [28, 91]. Subsequent work has liberalized Ownership Types to allow
222 Chapter 8. Comparison with Related Work
the systems to be used for tracking encapsulation rather than rigorously enforcing it.
Many common object-oriented design patterns, such as iterators, violate encapsulation
and so by separating the encapsulation enforcement from the encapsulation tracking,
ownership systems can be used to express these patterns which would otherwise be
prohibited.
Over time, two distinct families of ownership type systems have emerged: Ownership
Types and Universe Types. The main difference between these two families is in the
mechanism of tracking encapsulation relationships. Ownership Types track encapsula-
tion using explicit ownership parameters. Universe Types, by contrast, use only relative
notations such as rep on fields holding object representation and shared on fields hold-
ing data which are part of another object’s representation. An excellent summary of the
early work in the field can be found in “Types for Hierarchic Shapes (Summary)” [41].
In 2006, Nageli published a masters thesis which showed that a number of common
object-oriented design patterns could not be expressed in languages employing the then
state-of-the-art Universe and Ownership Types systems [90]. Nageli identified a number
of type system features required to make ownership systems in general compatible with
the design patterns studied. These features included ownership transfer, read only
references, and an ability to share objects between restricted sets of contexts [90].
Following Nageli’s thesis, a number of new ownership type systems have been published
which begin to address the shortcomings he identified. MOJO [25] allowed objects to
have more than one owner, thus transforming the traditional ownership tree into a
directed acyclic graph (DAG). MOJO also provided wildcards so that the types of
objects with multiple owners can be named without all of the owning contexts being
nameable. This simplified writing programs using the system [25]. Cameron’s Jo∃
added existential types to traditional ownership systems to allow provably-safe, con-
strained ownership variance [24]. Lu and Potter proposed Effective Ownership Types
which added a wildcard any context, which can be used to abstract owners of types
along with effective owners on methods to constrain the mutable side-effects of the
method [73]. Lu, Potter, Xue have also proposed Oval [74], a language which employs
validity contracts to determine which parts of a system a given method depends on and
which parts it may modify. The validity contract consists of a list of contexts which
Chapter 8. Comparison with Related Work 223
must be valid before the method’s execution and a list of contexts invalidated by the
method. This is a more general approach to the standard read and write effects such
as those I have used in my system. Using these validity contracts, it might be possible
to reason about parallelism, but no work has been published on this to date.
There has also been an effort to create formal core calculi to prove type system prop-
erties. Some of the more notable are Ownership Generic Java [95] (Featherweigtht
Generic Java extended with ownerships) and System Fown [69] which adds ownership
to System F. System F, otherwise known as the second-order λ-calculus, is a general
calculus for languages which has parametric polymorphism. This means it is a general
calculus which can be used to model the behavior of many other languages. This means
that adding ownership to this language is an important result from a type theoretic
point of view since it generalizes ownership to a large family of languages. These formal
calculi are not directly applicable to reasoning about parallelism, but they can be used
as the basis for type and effect systems which can be used to prove properties useful
for parallelization.
Ownerships Types research has produced a family of type systems with a number of
advanced features designed to facilitate typing of a number of complex programming
patterns and object relationships. The choice of which language features to employ is
determined by the types of program being analyzed and the analyses being employed.
In my ownership system, I use language features and ideas from a number of different
sources. My use of context parameters and effects resemble those of Joe [27]. The sub-
contexts found in my system resemble domains in Ownership Domains [3]. The effect
declarations used resemble those of Smith [108] and Joe [27]. Finally, the constraint
clauses resemble those found in MOJO [25]. A more in depth discussion of these points
is undertaken in Section 5.1 and I refer interested readers there. My system is capable
of supporting many of the latest and greatest language features in Ownership Types,
but I have not had a need to use them as yet.
8.4.2.1 Ownership Side-Effects
Early in the development of ownership systems, it was realized that the hierarchical
ownership structure was well suited for use as a framework for abstractly describing
224 Chapter 8. Comparison with Related Work
side-effects. These effects have been applied to several different problems, generally
related to validating programs. The idea of using these systems for reasoning about
parallelism has been mentioned in the past, but has not been thoroughly explored or
demonstrated before this thesis.
Clarke and Drossopoulou were amongst the first to propose an effects system based on
Ownership Types called JOE [27]. JOE provided facilities for capturing and validating
effects. JOE also included a system for statically reasoning about the disjointness of
effects. The disjointness operations formulated in JOE resemble those in Zal, but the
effects were not used to reason about data dependencies or other program properties.
JOE’s effect system focused on tracking write effects and did not include separate read
effects. The separation of read and write effects is important for computing accurate
data dependency information for use in finding inherent parallelism, as I found necessary
when formulating sufficient conditions.
Lu and Potter proposed Effective Ownership Types [73] which built on traditional
Ownership Types systems and focused on trying to separate ownership based effect
systems from encapsulation enforcement. Their system was not directly applied to any
real-world problems, but did present a number of innovative ideas. The any context in
their system is, for my purposes, equivalent to the world context when read or written.
This limits parallelization opportunities because a read or write of the any context
includes all other contexts in the system. In addition, the lack of method side-effect
declarations in Effective Ownership Types limits the ability to reason about side-effects
in the presence of overriding.
Several of the more recent Ownership Types systems including MOJO [25] have begun
to include effect systems as a matter of course in formulating new ownership systems.
These formalizations look at how the new language and ownership features proposed
impact the effect system. However, these type system papers do not apply the effects
system to actual programming problems such as parallelism.
8.4.2.2 Applications to Parallelism
In the literature there have been suggestions of using ownership types for purposes
relating to parallelism. This work has tended to focus on the validation of existing
Chapter 8. Comparison with Related Work 225
parallel programs and lock ordering in particular. Boyapati, Lee, and Rinard proposed
a system for validating lock ordering using Ownership Types [20]. Their effect system
captures the contexts locked by a particular block of code and succeeds in preventing
deadlock using these annotations. Milanova and Liu have presented a system which uses
the notion of ownerships to detect synchronization errors [85]. They do not modify the
type system, but use the notion of ownership and encapsulation to construct a graph
of the relationships between objects which is used in their analyses. Unfortunately,
these systems capture effects as sets of contexts locked and not as a set of contexts
read and written. The sets fo contexts locked do not provide sufficient information to
reason about inherent parallelism. In addition, my system adds a runtime system which
allows the relationships between context parameters and the disjointness of elements
in collections to be tested; a feature not found in these parallelism validation systems.
8.4.2.3 Summary
The area of Ownership Types has rapidly matured in the last few years. The literature
documents a number of advanced type systems with features to support a number of
different programming styles and reasoning systems. The choice of which to employ
in a particular language is determined by the problems the language is expected to
be applied to. A number of effect systems have been proposed based on Ownership
Types. To date, the parallelism applications have been limited to validation. Lock order
validation is a different problem that requires less complex reasoning than the general
computation of side-effects required to detect inherent parallelism (see Chapter 5) and
Zal(see Chapter 4).
8.4.3 Universe Types
Universe Types is an alternative branch of Ownership Types which is designed to facili-
tate encapsulation tracking and enforcement [88]. Universe Types are significantly more
lightweight than Ownership types in terms of annotation overhead both in the number
of annotations and their complexity. Rather than having to specify explicit owners on
types, fields are simply marked as being part of the representation, shared with other
objects, or other relationships depending on the specific system used. This reduced
226 Chapter 8. Comparison with Related Work
complexity has both advantages and disadvantages. The reduced complexity makes
it easier to implement some language features which have been quite hard to imple-
ment in Ownership Types such as ownership transfer [89] and ownership inference [83].
Recently, Dietl has published a PhD thesis, in which he has cleanly separated the own-
ership annotations from encapsulation enforcement so that the ownership annotations
can be used simply as a hierarchical description of the program’s data structures [38].
The less precise nature of the effects possible with Universe Types makes the system less
suitable than Ownership Types for reasoning about data dependencies and parallelism.
Effects can only be described as occurring on objects in a representation, peer, or other
relationship to the current object. Using such a system, it is not possible to determine
if references to objects with the same type and annotation can interfere or not which
is much less useful than the more precise owner effects possible with Ownership Types.
8.4.3.1 Applications to Parallelism
Cunningham, Drossopoulou, and Eisenbach proposed a system for validating a program
to check that locks are taken on shared resources to prevent data races [34]. As was the
case with the work of Boyapati et al. [20] discussed above, the effect system does not
capture effects in a sufficiently broad and precise manner to facilitate reasoning about
inherent parallelism as my proposed system does.
8.4.4 Ownership Domains
In Ownership Domains, proposed by Aldrich and Chambers [3], programmers declare
explicit memory regions, called domains, in which to store data. The key difference
between Ownership domains and the earlier work of Greenhouse and Boyland [49] is
that Ownership Domains use ideas from Ownership Types to provide some default re-
lationships between domains in addition to allowing programmers to declare explicit
access permissions on domains. Ownership Domains also allow types to be parameter-
ized with domain parameters so that domains can be passed to objects. The domain
parameters may be constrained to restrict the domains which can be supplied when
instances of the type are constructed. The sub-contexts in my proposed system are
similar to the domains in Ownership Domains. Ownership Domains are flexible and
Chapter 8. Comparison with Related Work 227
can capture more complex and subtle object relationships than traditional Ownership
Types. Building a hierarchy from the declared domains is much more complicated due
to the complex inter-domain relationships that are permitted in Ownership Domains.
8.4.4.1 Effects
Smith has proposed an effect system based on Ownership Domains [108]. Smith’s
proposed effects system employs explicit read and write sets such as those used on
methods in my system. Similarly, the system I propose, Smith’s system has the abil-
ity to constrain relationships between domain parameters. This is equivalent to the
context constraints used in my system. Some of Smith’s effects resemble the effects I
have chosen to employ, while others not found in my system are based on additional
expressiveness inherent in Ownership Domains. These additional annotations may be
useful in solving parallelism related problems, but come at the cost of the increased
annotation complexity and overhead of the Ownership Domains system. Smith made
no attempt to apply his effects system to a specific domain such as parallelism. While
the work contributes a number of interesting ideas which may be useful, it does not
discuss how to use the system to solve specific problems.
8.4.5 Boxes
Boxes [106] and Loose Ownership Domains [104] are both type systems based on Own-
ership Domains. As with basic Ownership Domains type system, the programmer
explicitly declares the regions in which objects live and can control the access per-
missions and relationships between regions. These type systems add to Ownership
Domains, a system for allowing domains to be abstracted. This increases the flexibility
and modularity of the type system. The use of sub-contexts in my proposed system is
basically a subset of the functionality of these type systems and provides some of the
same power and flexibility without the need to always declare domain relationships.
The Boxes type system has been applied to the actor programming model to help
enforce object encapsulation in the CoBoxes type system [105]. In the CoBoxes type
system, classes can be marked as CoBoxes. Each CoBox has at least one stream of
228 Chapter 8. Comparison with Related Work
execution and invoking methods on the CoBox is done asynchronously. The domains
in the type system enforce encapsulation so that data in CoBoxes is disjoint to prevent
data races between CoBoxes. Similarly to other actor and message passing based
systems, the CoBox concurrency model is powerful and flexible, but often requires
existing programs to be restructured before they can take advantage of the language’s
parallelism features. Further, the language does not assist the programmer with the
task of deciding which class should be CoBoxes and which should not, unlike the system
I have proposed.
8.4.6 Uniqueness, Read-Only References and Immutability
The major alternative to using Ownership Types for encapsulation enforcement involves
the use of unique and read-only references to objects. By restricting a reference to be
read-only, the reference cannot be used to violate encapsulation since modifications of
the object being referred to are not permitted. By restricting access to encapsulated
state, the process of ensuring invariants are enforced is greatly simplified.
The origins of this approach can be traced to Hogg’s Island types [57]. In Hogg’s
system, objects are grouped; within each group unrestricted aliasing is allowed, but
external aliases to the group must be made via the single bridge object that connects
the objects in the “island” to the rest of the world.
Uniqueness type systems have not been as well studied on their own, but they have
been used in conjunction with other encapsulation tracking and enforcement systems to
provide greater flexibility and expressive power. There have been efforts to add these
features to traditional Ownership Types systems including Featherweight Ownership
with Immutability Generic Java [129]. No major efforts have emerged trying to use
these properties to exploit the side-effect restrictions implied by these techniques for
parallelism verification or parallelization purposes.
8.4.7 SafeJava
SafeJava is an extension of the Java programming language. Its type system is based
on Java’s type system extended with ideas from Ownership Types [19]. The simplicity
Chapter 8. Comparison with Related Work 229
and flexibility of the type system falls somewhere between that of Ownership Types
and Universe Types. SafeJava validates synchronization between threads to prevent
data races and deadlocks as well as provide region-based memory management. The
system for reasoning about data races and dead locks focuses on reasoning specifically
about locks. Methods can be annotated with lists of locks that must be held when
the method is invoked and lists of locks taken by the method. There are a number of
advanced features for reasoning about locks and thread local storage, but the system
does not attempt to discover where inherent parallelism exists. This is a very different
problem since it requires discovery of data dependencies using effect information.
8.4.8 Deterministic Parallel Java
In late 2009, Bocchino et al., published a paper on Deterministic Parallel Java (DPJ) [18]
which is probably the closest competitor to the system I have presented in this thesis.
DPJ uses a variant on ownership types to reason about the data dependencies and
inherent parallelism.
The major difference between DPJ and Zal lies in how contexts are specified. In
DPJ, contexts are specified using Region Path Lists (RPLs) rather than simple names.
These RPLs list the contexts between the context being named and the root. Parts of
the list may be abstracted using an asterisk (*). This difference in how contexts are
named has a significant impact on how context disjointness is determined. In DPJ,
the disjointess of two contexts can be determined by comparing their RPLs to see if
there are any common contexts which would indicate that the named contexts share a
parent-child relationship. The use of RPLs have allowed DPJ to be used to successfully
detect and exploit inherent parallelism in a number of traditional scientific computing
benchmarks. Unfortunately, the use of these RPLs could cause implementations details
to be leaked through the context and effect abstractions. Appropriate use of the asterisk
(*) may help to prevent this, but even occasional leakage could prevent the system
from composing as mine doe. In addition, the use of RPLs requires programmers to
maintain awareness of the ownership structure in their entire program rather than just
local ownership information as is required in Zal.
Unlike Zal, DPJ does not make use of a runtime ownership tracking system and so does
230 Chapter 8. Comparison with Related Work
not provide support for conditional parallelization. DPJ requires the same collection
disjointness as Zal, but takes a different approach to enforcing the condition. In DPJ,
the compiler allows the programmer to use a form of dependent types to try to verify
statically that element access operations in an array are disjoint and can be run in
parallel. This is a complex undertaking and requires the programmer to identify and
expose index variables and other implementation details to the DPJ compiler, unlike the
iterator based approach used by Zal. The DPJ approach is well suited to traditional
scientific computing tasks, the focus of the DPJ system, but is more complex and
exposes more implementation details than the approach I used in Zal. I chose to
use a runtime disjointness test so that more general purpose loops could be handled
by Zal. In terms of sufficient conditions for parallelism, DPJ uses RPL effects in a
traditional data dependency analysis. The authors do not propose sufficient conditions
in terms of RPLs. The sufficient conditions I have presented for Zal can be used by
programmers, as well as automated tools. This helps facilitate reasoning about side-
effects and dependencies.
DPJ is a very powerful, flexible, and useful system, but it lacks the ability to compose
as effectively as Zal. The DPJ authors have demonstrated that their system is able to
parallelize a number of traditional scientific benchmarks including benchmarks from the
Java Grande suite which Zal is not currently able to parallelize. The RPL ownership
annotations employed by DPJ are more complex than the simple ownership annotations
employed by Zal. As a result, it is harder to annotate a program using DPJ RPL-style
annotations than it is to annotate the same program with Zal-style annotations. Zal’s
use of runtime ownership tracking allows for data-dependent program behavior which
is not possible with DPJ and so Zal can conditionally exploit parallelism in situations
where DPJ cannot (for example, the hash table example in Chapter 4. It may be
possible to combine the features of these two systems in the future to have the best of
both worlds, but how to do so is an open research question.
8.5 Logics
Computer scientists have long used formal mathematical tools to try to reason about
programs. Some of these techniques employ numerical methods. Other techniques
Chapter 8. Comparison with Related Work 231
rely on formal logic to try to prove properties of programs. In this section, I briefly
discuss two logic systems which can be used to reason about data dependencies and
parallelism and how these formal systems relate to the system I propose in this thesis.
These systems provide a formal basis from which to build, but they are not directly of
practical value in that they cannot be directly added to a language. The logic systems
are abstract reasoning systems and are not formulated for a specific language or set
of syntactic features. These logic systems, generally, need to be incorporated into a
language’s type and effect system at some level. This is usually a non-trivial task.
8.5.1 Hoare & Separation Logic
One notable contribution towards describing programs logically and proving properties
through the development of axioms and traditional proof techniques was made by Hoare
in his 1969 paper “An Axiomatic Basis for Computer Programming” [56]. In Hoare
Logic, as this contribution has been named, programs are described in terms of the
pre-conditions which must hold before they are run and post-conditions which describe
the result of the programs’s execution. Hoare introduced a notation of P{Q
}R where
P is the set of pre-conditions, R is the set of post-conditions, and Q is the program to
which the conditions apply.
Hoare logic has been applied to many problems, but the most relevant of these for the
purposes of reasoning about parallelism is Separation Logic simultaneously discovered
by John Reynolds [101] and Ishtiaq and O’Hearn [61]. In separation Logic, the pre and
post conditions describe the state of the program before and after execution as they do
in Hoare Logic [56]. Parallelism is determined by proving the independence of the pre
and post conditions for two programs to be run in parallel. As long as the state required
and modified by the two programs, as specified in the pre and post conditions, does
not contain any shared values there are no shared dependencies between the programs
and they can be run safely in parallel [102]. Reynolds presented some pre and post
conditions for some common language features to demonstrate the use of the system
and its power [102].
Separation Logic is a strong theoretical foundation for reasoning about programs due to
its derivation from the fundamental logic underpinning programming. There has been
232 Chapter 8. Comparison with Related Work
work on using Separation Logic to reason about inherent parallelism in programs, but
this work has been in conducting proofs and validating parallelization techniques [99].
The application of separation logic to real-world language features is complex and
difficult. The average programmer would not want to be burdened with having to
describe their entire program in terms of Separation Logic pre and post conditions.
8.6 Programming Languages for Parallelism
Each programming language has a combination of features and syntax which make it
unique. Often, new language features are created to solve a problem not easily solved
with existing language constructs. In this section, I examine several different languages
and compare and contrast their approaches to parallelization with my system. When
parallelizing a program, there are three questions which need to be answered:
1. Can the program be safely parallelized?
2. Should the program be safely parallelized; is it worthwhile doing so?
3. How should the program be parallelized?
Existing languages generally address only one or two of these questions. Overall, there is
no language with simple syntax and imperative style computation that provides features
which facilitate reasoning about data dependencies and parallelism. The languages
discussed in this section have a number of interesting features for expressing parallelism,
but there is no validation of whether the parallelism added to programs using these
constructs violates sequential consistency or not.
8.6.1 Haskell & other Functional Languages
The Haskell programming language was the product of an academic initiative to create
a standard language platform for performing research into functional programming [59].
Haskell is the current major exemplar of a lazy evaluating functional language and so
includes a number of unique features [59].
Chapter 8. Comparison with Related Work 233
Having been created for functional programming research, Haskell was designed as
a purely functional programming language with no shared mutable state and conse-
quently all functions are side-effect free. Initially, operations with inherent side-effects,
such as I/O, were modelled with great difficulty and some side-effecting operations had
to be implemented outside the Haskell language itself [59]. This problem was solved
by the introduction of Monads into Haskell to model mutable state, I/O, and all other
operations with side-effects [64].
The lack of side-effecting operations in purely functional programming languages means
that multiple function invocations can be evaluated in parallel. This means that these
purely functional programming languages help programmers to answer the question
of can the program be parallelized. These languages do not, however, address the
questions should the parallelism be exploited or how to exploit the parallelism when it
is worth exploiting.
Haskell uses monads to add operations with side-effects to the language without violat-
ing the functional purity of the language. Monads provide a type system mechanism for
sequencing side-effecting operations so that the Haskell interpreter respects the speci-
fied order for the side-effecting operations. Essentially, monads are a means of explicitly
sequencing operations in an inherently parallel language which is the opposite of my
system which tries to expose parallelism in an inherently sequential language. Because
monads provide a means of explicitly sequencing operations, it is possible that there
could be inherent parallelism amongst the sequenced operations. The Haskell type
system does not provide features to facilitate reasoning about the inherent parallelism
which may exist within operations sequenced using monads.
Haskell, similarly to many other functional programming languages, also has a number
of other powerful syntactic constructs. The most interesting of these is the list compre-
hension. List comprehensions allow for the generation of infinite data sets as well as the
application of operations to elements of a list without the need for explicit iteration over
the list itself [59]. This construct is ideal for parallelization since dependency detection
is greatly simplified and the construct is highly declarative in nature. The use of this
construct removes indexing and other iteration specific notations. By making the itera-
tion implied rather than explicit, loop carried dependencies are no longer an issue. The
234 Chapter 8. Comparison with Related Work
transformation being applied to each data element is also made explicit which is ideal
for parallelization and dependency detection. The enhanced foreach loop proposed
in my system (see Section 2.4.1.2) is also another more declarative means of writing a
loop. It is not as powerful and flexible as Haskell list comprehensions, but it serves a
similar purpose while providing a syntax more similar to traditional loops than that of
Haskell list comprehensions.
Haskell is a powerful language with many desirable parallelization properties. The lazy
functional programming paradigm is quite different from the the imperative program-
ming paradigm used by the majority of developers writing general purpose software.
These differences can create a barrier for those who would otherwise be interested in
using Haskell to improve the performance of their programs. While Monads order side-
effecting operations, exploiting an inherent parallelism between the ordered operations
is not currently supported by the Haskell language. Overall, the Haskell language of-
fers a number of interesting and powerful features which may contribute to solving the
problem of how to parallelize programs written using modern object-oriented languages,
but it does not provide a complete solution to the problems tackled in this thesis.
8.6.2 Cyclone
Cyclone is a C dialect designed to enforce memory safety; that is to prevent buffer
overruns, memory leaks, and other memory related problems endemic in programs
written in pure C [63]. Region-based memory access is used to ensure the safety of
memory operations [63].
In Cyclone, all pointer types are annotated with the region they reference either manu-
ally in the code by the programmer or automatically by an inference engine at compile
time [50]. The Cyclone compiler then uses the region annotations to determine if the
memory pointed to by any given pointer is still allocated and if assignment to the lo-
cation is permitted [50, 117]. The Cyclone type system employs effect annotations on
function signatures to determine which regions the function may access [50]. These ef-
fects are used to ensure that pointers cannot escape through existential types [50]. It is
conceivable that such a system could be used to reason about parallelism by computing
dependency information based on these effects, but this has not been explored to date.
Chapter 8. Comparison with Related Work 235
Significant amounts of C code have been ported to the Cyclone platform and the pro-
grammer burden of this porting has proven minimal thanks to the annotation inference
engine [63, 50]. It is likely that such a system would require significant modification if
it were to be used with object-oriented languages, but this project demonstrates that,
with an appropriate inference engine, annotation heavy language syntax can be used
with minimal burden on the programmer. It also shows that reasoning about memory
interactions is possible, even in highly unstructured languages such as C. The Cyclone
region-base memory model is more general than the ownership types or unique alias
models of encapsulation enforcement and so may contribute useful concepts towards
reasoning about data dependencies, and so, parallelism.
The ownership and effect system I have used to reason about side-effects and inherent
parallelism has been applied only to the safe subset of the C] language. The unsafe
subset of C] adds pointers to the language syntax and disables some of the language’s
automatic memory management. The use of pointers and pointer arithmetic means
that it is much harder to track where in the stack or heap a pointer is pointing. This
means it becomes difficult to determine what part of the stack or heap is being read
or written via the pointer. Cyclone uses a region model that is less structured than
the Ownership Types memory model; Cyclone regions do not have an implicit nesting
hierarchy associated with them as ownership contexts do. This less structured model is
ideally suited to modelling memory interactions in highly unstructured unsafe code. In
the future, when looking to expand Zal to include unsafe code, Cyclone regions may be
a useful starting point. For example, Cyclone has a syntax for annotating pointers with
region information which may be suitable for use with ownership contexts. Cyclone has
mechanisms for validating constraints on the areas of the heap and stack to which a
pointer may refer.
8.6.3 Scala
Scala was developed as a research programming language at the Ecole Polytechnique
Federale de Lausanne and it has subsequently gained some acceptance in the wider
programming community [93]. The core of the language is based on the object-oriented
236 Chapter 8. Comparison with Related Work
programming paradigm, but with the addition of functions as first-class citizens (in-
cluding higher-order functions, partial evaluation, and continuations) [93]. While the
blend of paradigms and features in Scala is interesting, the most relevant feature of the
language to this thesis is its concurrency model.
In Scala the basic unit of concurrency is the actor. Each actor runs on its own
lightweight thread [52]. Communication between actors is achieved through message
passing with non-blocking send and blocking receive semantics and pattern matching in
the receiver to decide how to process the received messages [52]. Lightweight threads
are virtual machine artefacts which are dynamically mapped onto operating system
threads and processes by the virtual machine during program execution [52]. This
model of computation has achieved great success in the Erlang functional program-
ming language [7] which is well regarded in both academia and industry and has been
gaining popularity. I have focused on discussing Scala rather than Erlang because it
demonstrates how the Erlang parallelism model can be modified for use in an object-
oriented language. Actor-style parallelism is well suited to exploiting coarse-grained
parallelism between entities in a system, but it does not lend itself well to the exploita-
tion of opportunities for fine-grained parallelism. Further, existing programs may need
significant restructuring to fit the actor concurrency model.
Scala and Erlang provide tools to allow the programmer to annotate in their program
which parts can be safely parallelized, but they do not support verification that the
parallelism added by a programmer is actually safe to exploit. Scala and Erlang also
take care of deciding how to exploit the parallelism through the language supplied
implementation of lightweight threads. The major difference is that these languages
require the programmer to restructure their program to fit the parallelism pattern built
into the language and they do not supply tools to facilitate this restructuring nor do
they validate any parallelism explicitly exposed by the programmer.
8.6.4 High Productivity Computing Languages
The Defense Advanced Research Project Agency in the United States has funded a
High Productivity Computing Systems (HPCS) research program over the last few
years [35]. The goal of this project is to develop technology to allow a multi-petaflop
Chapter 8. Comparison with Related Work 237
computer system to be built along with tools to allow programmers to efficiently write
scientific and cryptographic applications to be run on the new hardware. A number
of companies participated in this project and prototype development of three systems
from Sun Microsystems, IBM, and Cray was funded [35]. The development of these
prototypes produced three new programming languages: Fortress [115], X10 [103], and
Chapel [33] from each of the companies respectively. Each of these languages includes a
number of features for expressing parallel algorithms, but there is no verification of the
correctness of the parallelism encoded in a program or for identifying where additional
exploitable parallelism may be found in programs.
8.6.5 Spec#
Spec] is a superset of C] which allows programmers to encode pre conditions, post con-
ditions, a list of exceptions that can be thrown and declarations about variables and
fields modified by the method [13]. The declarations are used to facilitate the enforce-
ment of object invariants and design assumptions by the compiler and by the runtime
system called Boogie [13]. The overall goals of the language are to reduce the number
of errors that go undetected at development time to reduce software development costs
and to increase the quality of software produced [13]. The syntactic extensions in Spec]
are quite verbose and can impose a significant additional burden on the programmer.
Some representation exposure also occurs though the method contracts. This is not
generally desirable due to all the well known problems with abstractions leaking im-
plementation details. Spec] is an example of a language extension which allows Hoare
Logic-style pre and post conditions to be enforced in a program. There has been some
work inspired by Spec] and Boogie which has used similar techniques to reason about
parallelism. One example of a system which does this is Oval [74] which was discussed
in Section 8.4.2. Overall, the Spec]/Boogie style of enforced contracts allows a number
of program invariants and properties to be captured in a program and validated. Fully
generalized pre and post conditions, such as those found in Spec] are powerful, but writ-
ing these conditions can impose a significant additional overhead on programmers [74].
Spec] serves to demonstrate the practicality of Separation Logic driven approaches, but
does not directly contribute to systems for reasoning about parallelism.
238 Chapter 8. Comparison with Related Work
8.7 Alternative Concurrency Abstractions
Threads are one of the most popular concurrency abstraction in mainstream program-
ming languages today. Threads are an excellent abstraction for certain types of par-
allelism or for situations where precise control of the data synchronization process is
required. There are, however, a number of other concurrency abstractions provided
as libraries or language extensions which provide more abstract parallelism constructs.
This section will discuss several of these systems. The key point that will be demon-
strated is that there is little or no support in these tools for ensuring program semantics
are preserved when sequential programs are transformed into parallel programs using
these constructs.
8.7.1 Futures
The Futures model of parallel computation was first proposed as part of the MULTIL-
ISP programming language in 1985 [53]. More recently, futures have been added to the
Java Standard Editions platform in version 5.0 [122] and the .NET Platform in version
4 [80]. When a future is evaluated, the expression immediately returns undetermined
and a new thread of computation is started to evaluate the expression required to pro-
duce the actual value [53]. When the value is computed it replaces the undetermined
place holder. If the undetermined placeholder is accessed before the computation of
the required value is completed, the thread performing the read is suspended until the
future completes execution and returns a value to replace the placeholder [53].
This model of computation works well in purely functional programming language
where there are no side-effects from method invocation. Unfortunately, in Java and
C], expression evaluation can result in observable side-effects which are not captured
in the future model of computation as it was originally implemented [122]. Welc, Ja-
gannathan, and Hosking implemented a system of object versioning and dependency
determination that relieved Java programmers using futures of manually synchronizing
all access to shared state. This accounts for the side-effects of futures [122]. Unfortu-
nately, the solution proposed is complex and improves performance only when specific
types of parallelism are present. My Zal compiler makes use of futures to expose task
Chapter 8. Comparison with Related Work 239
parallelism discovered in programs.
8.7.1.1 Task Parallel Library & Parallel LINQ
In 2010, Microsoft released the Task Parallel Library(TPL) and Parallel LINQ (PLINQ)
as part of the .NET 4 platform [79, 80]. The TPL provides facilities for exploiting data
parallelism and task parallelism in .NET Platform languages such as Visual Basic.NET
and C]. Data parallelism can be exploited through the use of parallel for and foreach
loops while task parallelism can be expressed using tasks which allow for constructs such
as one-way calls and futures. C] 3 and Visual Basic 9 introduced Language Integrated
Queries (LINQ) which allowed data sources to be manipulated in the language using
an SQL-like syntax. PLINQ provides an API which can be used to parallelize the
execution of a LINQ query (see Section 4.5).
Both the TPL and LINQ only supply methods which facilitate the exploitation of
parallelism; they do not provide any means of checking the correctness of programs
parallelized using the API. This means these tools are useful for programmers with
specialist parallelism training, but they do not help programmers without this training
correctly parallelize their applications. My Zal compiler makes use of the parallel
foreach loops to implement data parallel foreach loops. They are also used in the
supplied implementations of my enhanced foreach loop API.
8.7.1.2 OpenMP
OpenMP is a set of APIs and syntactic extensions for C++ and FORTRAN which is
designed to make it easier to exploit fine-grained parallelism in applications on shared
memory computers [94]. Programmers are required to explicitly annotate the par-
allelism in their code as well as provide any necessary synchronization between the
different threads of execution [94]. The annotation system is large, complex, and re-
quires background knowledge of parallelism and synchronization techniques. Further,
the programmer is responsible for understanding different patterns of interaction which
may occur and providing appropriate guards in the code to prevent races and deadlocks.
Programs must be transformed so that the parallelism to be exploited is exposed in a
manner consistent with the provided APIs. Once transformed, the lack of validation
240 Chapter 8. Comparison with Related Work
of the correctness of the parallelized program means that errors in the parallelization
can trigger unrelated error messages making debugging the program quite difficult.
OpenMP helps to simplify the syntax required to expose and manage parallelism, but
it does not help programmers unfamiliar with parallelism find and safely exploit inher-
ent parallelism.
8.7.2 Message Passing
Message passing is a common parallelism pattern where different processes or threads
communicate by passing messages by value between each other. Features supporting
this style of parallelism can be found in Smalltalk [67] and have been included in other
object-oriented systems [2, 67, 121]. In this section I will compare and contrast my
proposed system with several different message passing parallelism models.
8.7.2.1 Actor Model
The actor pattern of concurrency was first described by Hewitt, Bishop, and Steiger in
1973 [55]. This work was subsequently generalized by a number of other researchers
including Agha who proved the composability of actor based systems as well as pre-
senting a well formulated general actor system for distributed computing [2]. The
fundamental principle of the model is that actors communicate by message passing
with asynchronous message sending and synchronous receiving [121]. This idea of com-
munication via message-passing also formed a part of the original object-oriented pro-
gramming paradigm formulation and the associated language — Smalltalk [47, 67, 121].
Communication by message passing requires that only pass-by-value semantics apply
to the function invoked using it [67, 121]. Further, it is also necessary to ensure that
objects are fully encapsulated so that only the shared mutable state associated with a
specific object may be modified only via that object’s interface [2, 121]. In popular mod-
ern object-oriented languages such as Java and C], message passing has been reduced
to direct function invocation and parameters may now be object references creating the
complicated mutable state access patterns seen today [121]. Actor model research has
most recently focussed on distributed computing and so can model parallelism. The
Chapter 8. Comparison with Related Work 241
actor model maps well to the object-oriented paradigm and would allow for easy ex-
ploitation of coarse-grained parallelism. Further, the synchronization and parallelism
are completely transparent to the user. Unfortunately, the pass-by-value semantics
required by this model are prohibitively expensive for programs written in modern
object-oriented languages which rely on pass-by-reference semantics for performance
reasons. The strict encapsulation required would prohibit the use of many common
object-oriented design patterns as discussed by Nageli [90]. There has been some re-
search into combining Ownership Types with the actor concurrency model [26, 110].
The ownership is used to ensure that the messages passed between actors, and that the
actors themselves, are fully encapsulated as required.
8.7.2.2 MPI
The Message Passing Interface (MPI) is the most popular, scalable, manual paralleliza-
tion tool today. It consists of an API for C, C++, and FORTRAN. The interface was
designed to provide a general message passing facility to its target platforms [77]. The
most significant problem with the use of MPI to express parallelism is the need to
significantly refactor existing programs and to provide explicit synchronization code.
The programmer is not assisted with these tasks which can be complex. Some have
criticized the MPI library for the complexity of its API and the lack of features to
specifically support parallelism [77], but the API provides all of the functionality re-
quired and has been successfully used to express parallelism. MPI helps to simplify the
implementation of parallelism and also helps to simplify the management of distributed
parallelism. Unfortunately, the amount of information required from the programmer
to use the API requires the programmer to be very familiar with parallelization and
synchronization to use it properly.
8.7.2.3 Jade
Jade is a Java-like language which provides support for a message passing style of par-
allelism via parallel classes [37]. Parallel classes are somewhat like actors in that each
parallel class executes on its own thread and method invocations occur asynchronously.
242 Chapter 8. Comparison with Related Work
As with MPI itself, code refactoring is required to make use of Jade’s parallelism fea-
tures. Further, programmers must explicitly identify which classes should be parallel
classes. This allows programmers to express parallelism, but there is no strong vali-
dation to ensure dependencies which could lead to data races cannot be created via
message passing. In addition, significant code refactoring may be required to increase
cohesions, reduce coupling, and make programs adhere to message passing conventions.
8.8 Object-Oriented Paradigm Considerations
Object-oriented programming employs a number of different techniques to reduce devel-
opment effort and increase reusability. Design patterns, most prominently championed
by Gamma, Helm, Johnson, and Vlissides [44], can offer opportunities for coarse-grained
parallelism. The prevalence and nature of inherent parallelism in modern object-
oriented programs is not well understood. In addition, many object-oriented languages
contain some meta-programming facilities. These facilities are frequently implemented
outside the language itself and they can cause encapsulation breaches amongst other
complications [21]. Modern use of object-oriented programming also emphasizes sepa-
rate compilation and software componentization which creates another set of challenges
for reasoning about parallelism. These different aspects of object-oriented programming
will be briefly explored in this section to demonstrate their impact on parallelism and
the need to consider them when designing solutions to the problem of how to express
parallelism.
8.8.1 Design Patterns
Modern software engineering makes great use of design patterns, standard arrangements
of object interaction, to form solutions to common classes of problems [44]. This
means there are many different types of patterns solving a number of different types
of problems. For example, there are patterns for concurrency, object creation, and
collection processing [23]. Many of these patterns are designed with object-oriented
programming in mind and so they are one of the paradigm specific considerations
which influenced the design of my reasoning system [44, 23].
Chapter 8. Comparison with Related Work 243
These patterns can provide extra information about how a program works and the
dependencies it contains. These patterns offer a much higher-level understanding of
programs and code fragments. This higher-level knowledge could be used to recognize
opportunities to employ coarse-grained parallelism. Further, these patterns also provide
contextual information which may simplify some of the analysis required to exploit fine-
grained parallelism.
Currently, there is little knowledge about the prevalence of these patterns in actual
general purpose software and the amount of parallelism that could be extracted from
them. The most extensive study of parallelism patterns to date has been undertaken by
researchers at the University of California at Berkeley who have identified 13 common
parallelism patterns [9].
8.9 Summary
In this chapter I have presented a discussion of publications in six different areas which
are directly related to the work in this thesis.
Some of these systems provide abstractions that are well suited to reasoning about
data dependencies and parallelism. Others provide excellent abstractions for expressing
parallelism and frameworks for understanding program behavior. None of these systems
combines abstraction and composition, with parallelization and correctness checking to
produce a framework which helps both programmers and automated tools to reason
about inherent parallelism.
244
Chapter 9
Conclusion & Future Work
9.1 Summary
In this thesis, I have demonstrated how side-effects can be expressed in an abstract
and composable manner using the representation hierarchy present in programs writ-
ten using modern imperative object-oriented languages. I also demonstrated how data
dependencies could be inferred from the overlap of effects expressed using my effect
system. I have shown how my proposed effect system can be implemented using Own-
ership Types in Chapter 3 and have argued for the soundness of my proposals based
on previously published type systems in Chapter 5. Chapter 5 also sketched proofs of
the correctness of the runtime ownership relationship testing algorithms and sufficient
conditions for parallelization which are at the core of this thesis.
Chapter 4 discussed how my proposed type and effect system could be applied to
version 3.0 of the C] programming language to produce a language I have named Zal.
This application involved adding ownership type and effect information to a number of
syntactic constructs not previously considered in the literature. I have extended a full
C] compiler to produce a compiler for Zal. The design of this compiler was presented
in Chapter 6. In addition to the compiler itself, I wrote several additional support
libraries used to implement Zal and runtime ownership tracking in C]. In Chapter 7, I
presented a number of representative sample applications which showed that my system
was able to detect a number of different forms of inherent parallelism with reasonable
annotation and runtime overheads. Finally, in Chapter 8, I compared and contrasted
245
246 Chapter 9. Conclusion & Future Work
my work with other relevant related works from the literature. None of these other
systems combines abstraction and composition, with parallelization and correctness
checking to produce a framework which helps both programmers and automated tools
to reason about inherent parallelism. As a result, this thesis makes a contribution to
the current state-of-the-art in parallelization techniques.
9.2 Contributions
The focus of this thesis is how to facilitate reasoning about the inherent parallelism
in strong, statically-typed, imperative, object-oriented languages. The main idea is
an abstract and composable effect system that can be used to specify side-effects as
part of a method’s signature and to then use this effect information to detect data
dependencies. The major contributions of this thesis are:
• The design of the Zal language — a novel language combining features from Own-
ership Types and adapting them to facilitate capturing, abstracting, and validat-
ing side-effects in a real language on real programs to facilitate the detection of
inherent parallelism.
• Developing and proving sufficient conditions for parallelism in terms of ownership
effects and relationships for a number of different parallelism patterns.
• The design and implementation of a runtime ownership system — complements
static reasoning about ownership context relationships to facilitate conditional
parallelization of code blocks.
• Empirical evaluation of the systems designed to demonstrate the practicality of
the approach proposed.
This thesis has also made a number of smaller, more technical contributions which
have arisen during the validation of my ideas on realistic applications. I have applied
ideas from Ownership Types to the full C] language including handling static fields
and methods, indexers, properties, delegates, and user-defined value types; language
features not previously discussed in the ownership literature. As part of the process
Chapter 9. Conclusion & Future Work 247
of making and validating these contributions, I have created a compiler for the Zal
language which is a version of C] version 3.0 extended with my ideas. In building
this compiler, I added support for arbitrary type parameters to the GPC] research
compiler [32]. This compiler and its type parameter infrastructure are available from
the MQUTeR website [32]. Finally, I have demonstrated, though the application of my
ideas to representative sample applications, that my proposed approach to reasoning
about side-effects and parallelism is feasible.
9.3 Conclusions
In this thesis I have developed a novel framework which can be used to discover inherent
parallelism in programs written using modern, imperative, object-oriented languages. I
have demonstrated the feasibility of my proposed techniques through their application
to the C] programming language and a number of realistic sample applications. This
work represents a great first step on the road to reasoning about side-effects and data
dependencies in an abstract and composable manner. Do I claim that these techniques
are for practical applications? No, after analyzing the validation results I have identified
three categories of potential barriers to practical application:
• annotation overhead — as with other kinds of type annotations, Ownership
Types can be criticized for increasing the amount of code a developer needs to
write and complicating the construction of valid types in programs [1, 85];
• loop parallelism limitations — there are a number of syntactic and semantic
restrictions on the loops which my system can parallelize; and
• language limitations — the language model restricts the syntax and semantics
allowed in the base programming language extended with ownership annotations.
There are ways of addressing each of these categories of potential barriers. In the rest
of this section I elaborate on some of the possible avenues for addressing these potential
barriers.
248 Chapter 9. Conclusion & Future Work
9.3.1 Memory Overhead
The memory overhead measured for the runtime ownership system is non-trivial. It
would be interesting to explore the sources of this overhead more closely. It may be
possible to significantly reduce the overhead by modifying the Common Language Run-
time (CLR)to help facilitate this runtime ownership tracking. It may also be possible
to further optimize the packing of the data structures to minimize memory use once
the runtime ownership tracking fields are added to objects.
9.3.2 Annotation Overhead
Like any type annotation system, Ownership Types can be criticized for increasing the
amount of code a developer needs to write and complicating the construction of valid
types in programs [1, 85]. The cost of adding type annotations needs to be balanced by
the benefits obtained from doing so. Further reducing the annotation overhead would
contribute significantly towards making my proposals practical for everyday program-
ming tasks. In this section I will discuss ownership inference, ownership transfer, and
temporary owners for transient objects as avenues for reducing the annotation burden.
9.3.2.1 Ownership Inference
At present the type system used to provide the framework for localized reasoning about
program properties requires all types to be parameterized with context information.
This imposes a significant burden on the programmer since all types in an application
require annotation with ownership information. One way to relieve at least some of
this burden would be to build an ownership inference system. Ownership inference is
currently an active field of research in the Ownership Types community [85, 84, 1] and
it promises to be an interesting avenue for future work.
9.3.2.2 Ownership Transfer
With the type system currently included in my system, there are some programs which
cannot be properly annotated with ownership parameters. For example, annotating
Chapter 9. Conclusion & Future Work 249
a program which implements an abstract factory pattern would not be possible. In
2006, Nageli published a master’s thesis which studied how many of the Gang of Four
design patterns the then-current state-of-the-art ownership systems could implement.
He proposed a number of additional language features which could greatly increase the
number of design patterns that these systems could handle and central to these exten-
sions was the notion of ownership transfer. Ownership transfer would allow an object’s
owner to be changed after declaration and could be used to implement a number of
different patterns such as the Factory. Some work has been done on ownership trans-
fer in Universe Type systems by Wrigstad [127] as well as by Muller and Rudich [89].
Quite how these ideas could be retrofitted on to Zal remains to be seen, but would be
an interesting avenue of future study to pursue and would allow more programs to be
successfully annotated using my framework.
9.3.2.3 Temporary Owners for Transient Objects
Programs written in modern imperative object-oriented languages sometimes create
transient objects as part of a computation. These transient objects are not stored
in any fields and do not outlive the scope in which they are created. They are used
for a number of purposes from marshalling data to converting data types. Currently
Zal, as with most other ownerships systems, lacks a special context which can be
used to own these temporary objects. They must be assigned a regular owner as is
any other object. This can cause an increase in the number of false-positive data
dependencies reported when testing effect sets for disjointness. Further, the lack of a
special owner does not capture the fact that a given object is expected to be transient
by the programmer. Capturing this information would allow the compiler to detect
if the programmer violates their assumptions about the use of such objects. This, in
turn, may help detect bugs during compilation. How to add such a transient owner
is not currently clear. It may be possible to adapt some of the ideas presented in
“Existential Owners for Ownership Types” to this purpose [128]. Ensuring the safety
and correctness of Zal while preserving the ability to reason about context relationships
are both areas for future study.
250 Chapter 9. Conclusion & Future Work
9.3.3 Loop Parallelism Limitations
There are a number of syntactic and semantic restrictions on the loops which my sys-
tem can parallelize. Firstly, there is a need to look beyond data parallel loops. All
of the sufficient conditions for loop parallelism in the framework apply only to data
parallel loops. Secondly, the handling of the element uniqueness sufficient condition in
my framework at present is not ideal. Finally, only foreach and enhanced foreach
loops are handled by the framework I have proposed. In this section, I present ideas
for possible future avenues of research which may help to address some of these short-
comings.
9.3.3.1 Improved Handling of Collections
Consider the simplest expression for reading an element from an array which has the
form a[i]. When computing the read and write effects, there would be a read of the
owner of the array a and that effect would not change with the value of i supplied.
There are no means of describing effects at the level of individual elements of a col-
lection. Adding this support could also provide improved support for enforcing the
collection uniqueness requirement and so validating this programmer assertion.
One possible avenue for addressing this problem would be to parameterize contexts in
effect sets with values in a manner similar to dependent types. This would allow effects
on collections to be parameterized with a value to describe which parts of the collection
are being modified.
Another possible avenue would be to adapt some of the ideas from Deterministic Parallel
Java (DPJ) for use in Zal [18]. This would avoid the complications associated with
dependent types. How to facilitate effect composition using DPJ style effect notations
is an open question which would have to be answered.
9.3.3.2 Light’s Associativity Test
In Chapter 2, I introduced loop parallelism by stating that there were three basic
classes of loop operations: map, reduce, and filter. The sufficient conditions presented
and proved in this thesis have focused almost exclusively on mapping operations — the
Chapter 9. Conclusion & Future Work 251
major source of parallelism in imperative programs. There are, however, performance
gains to be made through the parallelization of other loop patterns as well, although the
gains may be smaller than the gains obtained by parallelizing the execution of mapping
loops.
Of particular interest is the reduction pattern where the loop operation takes two ele-
ments of the collection and produces a single value. When such an operation is applied
across an entire collection, it reduces the collection to a single value. Examples of such
an operation might be the summation of a collection of integers or the concatenation
of a collection of strings. To parallelize such an operation, the reduction order needs to
be changed. Unfortunately, not all reduction operations will produce the same result
if the order of application to the collection is modified.
Mathematicians call operations associative when their order of application can be
changed without affecting the result computed, provided the operands remain in the
same order. Proving that an operation is associative is a non-trivial process in the
general case and is a problem studied in some detail by mathematicians. One asso-
ciativity test for an operation documented in the mathematical literature is Light’s
Associativity Test [29] later improved by Bednarek [14] and Kalman [66]. This test
allows an operator to be proved to be associative over a fixed domain in a mechani-
cal manner. The test requires the operation being tested to be applied to a number
of pairs of elements in the domain being tested and the results compared. With the
addition of an effects system to a programming language, it would become possible to
determine when such a test could be safely performed without modifying the state of a
program. Such a test could be used to dynamically determine if a reduction operation
is associative and so the loops employing it amenable to parallelization. There would
be serious overheads involved in running such a test and determining when and if it
is worth testing an operation would require additional study. Once implemented, a
functioning runtime Light’s Associativity Test could be used in several different ways
including automated parallelization of reduction loops in specific circumstances and
verification of programmer supplied annotations asserting operation associativity.
252 Chapter 9. Conclusion & Future Work
9.3.4 Language Limitations
Finally, the language model restricts the syntax and semantics allowed in the base
programming language extended with ownership annotations. The system presented in
this thesis does not permit unrestricted references into the program stack nor does it
support explicit pointer types. To increase the number and style of languages to which
my techniques can be applied, it would be desirable to add support for these language
features. While this work would not make the system more practical for use in one of
the languages already considered, such as Java or C], it would allow the techniques to
be applied to other less structured languages.
9.3.4.1 Liberalization of the Stack Model
The stack model used to develop the system for reasoning about side-effects, depen-
dencies and inherent parallelism in this thesis was restricted. I required the heap to be
orthogonal to the stack and access into the stack to be tightly controlled. While Java
and the safe subset of the C] language fit such a model, there are many other popular
imperative object-oriented languages which do not; for example, C++ and the unsafe
subset of C].
There are a number of possible approaches that could be taken to liberalize the stack
model. One option would be to extend ownerships to include both the stack and the
heap. One approach to doing this would be to have stack contexts which correspond
to the scopes of stack variables with the hierarchy of the contexts corresponding to the
scope nesting hierarchy. References into the stack would need to be annotated with
context parameters corresponding to the stack scope being referenced. Precisely how
such a system would operate requires additional study, but offers hope that the system
can be modified to operate on languages employing more liberal stack models.
9.3.4.2 Handling Unsafe Code Blocks
In implementing Zal, I consciously focused only on the safe subset of the C] language
— code which does not make use of pointers. It would be interesting to explore how
to extend Zal to also support unsafe code blocks. The unsafe language subset includes
Chapter 9. Conclusion & Future Work 253
pointers and it operates on a weaker stack model where references to arbitrary stack
locations can be taken and so such an extension would likely be non-trivial.
It would be interesting to explore if a notation similar to that used in Cyclone [117]
could be used to annotate pointer data types with information about what areas of
the stack and heap they can refer to. The ownership parameters on a pointer type
could be used to determine what areas of memory are being read when a de-referencing
operation is performed, not unlike the reading of fields from reference types in Zal.
9.3.4.3 Multiple Ownerships to Model Communication Channels
One of the latest recent developments in the field of Ownership Types is type systems
which permit objects to have multiple owners [25]. These type systems can provide
greater flexibility than single ownership systems. This means that they can handle
patterns not easily expressed in single ownership languages. One of the interesting
questions is what multiple ownership means or represents in a system, like that proposed
in this thesis, which uses contexts to reason about side-effects and dependencies.
In my opinion, one possible interpretation is that they represent communication chan-
nels between objects in two or more different contexts; the end points of the channel
being the context’s owners. Such a communication channel could be used in many
ways. One example would be for message passing style communication between differ-
ent threads of execution in parallel program. Figure 9.1 shows two objects with single
owners (labelled a and b) communicating via an object jointly owned by a and b.
Figure 9.1: Illustration of two singly owned objects a and b communicating via a jointlyowned context, labelled a&b.
254 Chapter 9. Conclusion & Future Work
9.4 Summation
In this thesis I have presented a system which combines abstraction and composition
with parallelization and correctness checking to produce a novel framework for reason-
ing about inherent parallelism in imperative object-oriented programs. This framework
provides a new way of looking at data dependency analysis, through the use of side-
effects rather than pairwise comparison of statements. This helps both programmers
and automated tools reason about inherent parallelism. The key idea behind this frame-
work and this new perspective is the extension of the programming language to create
a scaffold for localized reasoning about program properties.
I have demonstrated through my proof-of-concept implementation that these ideas are
feasible. Through reflection on the results of my system validation, I believe that
there are three main barriers to the use of my ideas in a practical system: annotation
overhead, loop parallelism limitations, and language limitations. All of these barriers
relate to a cost-benefit analysis: is the cost of annotating the program with additional
information worth the benefits obtained from the parallelization. Earlier in this chapter,
I proposed a number of ideas which could be explored to help reduce the annotation
cost as well as increase the amount of paralellization benefit. However, the language
supported framework for localized reasoning about program properties is not limited to
applications in the domain of parallelism. A number of techniques in computer science
could benefit from this approach including program optimization, program verification,
and memory management. With additional work, the cost of annotating a program may
be reduced while the benefits, obtained from the simplified reasoning about program
invariants and behavior, increase. The net result would make the benefit outweigh the
cost, thus producing a system practical for use in everyday computing.
Bibliography
[1] Marwan Abi-Antoun and Jonathan Aldrich. Compile-time views of execu-
tion structure based on ownership. In Proceedings of the International Work-
shop on Aliasing, Confinement, and Ownership in Object-oriented Programming
(IWACO) 2007, pages 93–104, 2007.
[2] G. Agha. Actors: A Model of Concurrent Computation in Distributed Systems.
PhD thesis, Massachusetts Institute of Technology, 1985.
[3] Johnathan Aldrich and Craig Chambers. Ownership domains: Separating aliasing
policy from mechanism. In ECOOP 2004 — Object-oriented Programming, pages
1–25, 2004.
[4] V. H. Allan, R. B. Jones, R. M. Lee, and S. J. Allan. Software pipelining. ACM
Computing Survey, 27(3):367–432, 1995.
[5] P. S. Almeida. Balloon types: Controlling sharing of state in data types. In 11th
European Conference on Object-Oriented Programming, pages 48–64, 1997.
[6] L. O. Andersen. Program and Specialization for the C Programming Language.
Phd thesisk, University of Copenhagen, Denmark, DIKU Rep. 94/19 1994.
[7] J. Armstrong. The development of erlang. In Proceedings of the second ACM
SIGPLAN International Conference on Functional Programming, pages 196–203,
New York, NY, USA, 1997. ACM.
[8] Pedro V. Artigas, Manish Gupta, Samuel P. Midkiff, and Jose E. Moreira. Au-
tomatic loop transformations and parallelization for Java. In Proceedings of the
14th international conference on Supercomputing, pages 1–10, New York, NY,
USA, 2000. ACM.
255
256 BIBLIOGRAPHY
[9] Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis,
Parry Husbands, Kurt Keutzer, David A. Patterson, William Lester Plishker,
John Shalf, Samuel Webb Williams, and Katherine A. Yelick. The land-
scape of parallel computing research: A view from berkeley. Technical Report
UCB/EECS-2006-183, Electrical Engineering and Computer Sciences, University
of California at Berkeley, December 2006.
[10] U. Banerjee, R. Eigenmann, A. Nicolau, and D.A. Padua. Automatic program
parallelization. Proceedings of the IEEE, 81(2):211 –243, feb 1993.
[11] Utpal K. Banerjee. Dependence Analysis for Supercomputing. Kluwer Academic
Publishers, 1988.
[12] Gilad Barcha. Pluggable type systems. Proceedings of the OOPSLA04 Workshop
on Revival of Dynamic Languages, 2004.
[13] Mike Barnett, K. Rustan M. Leino, and Wolfram Schulte. The spec] programming
system: An overview. In Proceddings of the Construction and Analysis of Safe,
Secure, and Interoperable Smart Devices International Workshop, pages 49–69,
2004.
[14] A. R. Bednarek. An extension of light’s associativity test. The American Math-
ematical Monthly, 75(5):531–532, May 1968.
[15] A. J. Bernstein. Analysis of programs for parallel processing. IEEE Trransactions
on Electronic Computers, EC-15(5):757–763, October 1966.
[16] Graham M. Birtwistle, Ole-Johan Dahl, Bjørn Myhrhaug, and Kirsten Nygaard.
SIMULA Begin. Auerback Publishers Inc., Philadelphia, PA, 1973.
[17] W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, and T. Lawrence.
Parallel programming with Polaris. Computer, 29(12):78 –82, dec 1996.
[18] Robert L. Bocchino, Jr., Vikram S. Adve, Danny Dig, Sarita V. Adve, Stephen
Heumann, Rakesh Komuravelli, Jeffrey Overbey, Patrick Simmons, Hyojin Sung,
and Mohsen Vakilian. A type and effect system for deterministic parallel java.
In OOPSLA ’09: Proceeding of the 24th ACM SIGPLAN conference on Object
BIBLIOGRAPHY 257
oriented programming systems languages and applications, pages 97–116, New
York, NY, USA, 2009. ACM.
[19] Chandrasekhar Boyapati. SafeJava: A Unified Type System for Safe Program-
ming. PhD thesis, Massachusetts Institute of Technology, 2004.
[20] Chandrasekhar Boyapati, Robert Lee, and Martin Rinard. Ownership types for
safe programming: Preventing data races and deadlocks. In OOPSLA ’02: Pro-
ceedings of the 17th ACM SIGPLAN conference on Object-oriented programming,
systems, languages, and applications, pages 211–230, New York, NY, USA, 2002.
ACM.
[21] G. Bracha and D. Ungar. Mirrors: Design principles for meta-level facilities in
object-oriented programming languages. In 19th annual ACM SIGPLAN con-
ference on Object-Oriented Programming, Systems, Languages, and Applications,
pages 331–344. ACM Press, 2004.
[22] Martin Bravenboer and Yannis Smaragdakis. Strictly declarative specification of
sophisticated points-to analyses. SIGPLAN Not., 44(10):243–262, 2009.
[23] B. Bruegge and A. H. Dutoit. Object-Oriented Software Engineering Using UML.
Pearson Education, Upper Saddle River, NJ, 2 edition, 2004.
[24] Nicholas Cameron. Existential Types for Variance - Java Wildcards and Owner-
ship Types. PhD thesis, Imerial College London, 2008.
[25] Nicholas Cameron, Sophia Drossopoulou, James Noble, and Matthew Smith.
Multiple ownership. In Proceedings of the 2007 OOPSLA conference, volume 42,
pages 441–460. ACM, 2007.
[26] Dave Clarke, Tobias Wrigstad, Johan Ostlund, and Einar Broch Johnsen. Min-
imal ownership for active objects. In Proceedings of 6th Asisan Symposium on
Programming Languages and Systems, pages 139–154. Springer-Verlag Berlin Hei-
delberg, 2008.
258 BIBLIOGRAPHY
[27] David Clarke and Sophia Drossopoulou. Ownership encapsualtion and the dis-
jointness of type and effect. In OOPSLA ’02: Proceedings of the 17th ACM SIG-
PLAN Conference on Object-Oriented Programming, Systems, Languages and
Applications, pages 292–310, New York, NY, 2002. ACM Press.
[28] David G. Clarke, John M. Potter, and James Noble. Ownership types for flexible
alias protection. In 13th ACM SIGPLAN conference on Object-Oriented Program-
ming, Systems, Languages and Applications, pages 48–64. ACM Press, October
18-22 1998.
[29] A. H. Clifford and G. B. Preston. The Algebraic Theory of Semigroups, vol-
ume 1 of Mathematical Surveys and Monographs. American Mathematical Scoci-
ety, Providence, RI, 1961.
[30] N. H. Cohen. Type-extension type test can be performed in constant time. ACM
Transactions on Programming Languages and Systems, 13(4):626–629, October
1991.
[31] A Craik and W Kelly. Using ownership to reason about inherent parallelism in
object-oriented programs. International Conference on Compiler Construction,
2010.
[32] Andrew Craik and Wayne Kelly. Mquter parallelism research.
http://www.mquter.qut.edu.au/par/, 2009.
[33] Cray Inc. Chapel Language Specification 0.795. Cray Inc., Seattle, WA, USA,
April 2010.
[34] D. Cunningham, S. Drossopoulou, and S. Eisenbach. Universe types for race
safety. In Verifcation and Analysis of Multi-threaded Java-like Programs, pages
20–51, 2007.
[35] Defence Advanced Projects Agency. High Productivity Computing Systems
(HPCS) Program Plan. Defence Advanced Projects Agency, 2006.
[36] Peter J. Denning and Jack B. Dennis. The resurgence of parallelism. Communi-
cations of the ACM, 53(6):30–32, June 2010.
BIBLIOGRAPHY 259
[37] Jayant DeSouza and Laxmikant V. Kale. Jade: A parallel message-driven java.
In G. Goos, J Hartmanis, and J. van Leeuwen, editors, Computational Science —
International Conference on Computer Science 2003, pages 760–769. Springer-
Verlag Berlin Heidelberg, 2003.
[38] Werner Micheal Dietl. Universe Types — Topology, Encapsualtion, Genericity,
and Tools. PhD thesis, Swiss Federal Institute of Technology Zurich, 2009.
[39] E. W. Dijkstra. Recursive programming. Numerische Mathematik, 2(1):312–318,
Decembre 1960.
[40] Edsger Dijkstra. Go to statement considered harmful. Communications of the
ACM, 11(3):147–148, 1968.
[41] S. Drossopoulou, D. Clarke, and J. Noble. Types for hierarchic shapes (summary).
In P. Sestof, editor, European Symposium on Object-Oriented Programming 2006,
volume LNCS 3924, pages 1–6. Springer-Verlag Berlin Heidelberg, 2006.
[42] Nicu G. Fruja. Towards proving type safety of c]. Computer Languages, Systems
& Structures, 36:60–95, 2010.
[43] Michael Furr, Jong-hoon (David) An, Jeffrey S. Foster, and Michael Hicks. Static
type inference for ruby. In SAC ’09: Proceedings of the 2009 ACM symposium
on Applied Computing, pages 859–1866, New York, NY, USA, 2009. ACM.
[44] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: Elements of
Reusable Object-Oriented Programming Languages. Addison-Wesley Professional,
1995.
[45] D. K. Gifford, P. Joubelot, A. Sheldon, and J. W. O’Toole. Report on the
FX programming language. Technical Report MIT/LCS/TR-531, Massachusetts
Institute of Technology, Cambridge, MA, USA, 1992.
[46] Gina Goff, Ken Kennedy, and Chau-Wen Tseng. Practical dependence testing.
ACM SIGPLAN Notices, 26(6):15–29, 1991.
[47] Adele Goldberg and David Robson. Smalltalk-80: The Language. Addison-Wesley
Series in Computer Science. Addison-Wesley, Reading, MA, 1989.
260 BIBLIOGRAPHY
[48] K John Gough. The gplex scanner generator. Technical report, Queensland
University of Technology, December 2009.
[49] A. Greenhouse and J. Boyland. An object-oriented effects system. In Proceedings
of the European Conference on Object-Oriented Programming ’99, 1999.
[50] D. Grossman, D. Morrisett, T. Jim, M. Hicks, Y. Wang, and J. Cheney. Region-
based memory management in cyclone. In 29th ACM SIGPLAN conference on
Programming Language Design and Implementation, pages 282–293, Berlin, Ger-
many, 2002. ACM.
[51] Mary W. Hall, Saman P. Amarasinghe, Brian R. Murphy, Shih-Wei Liao, and
Monica S. Lam. Interprocedural parallelization analysis in suif. ACM Transac-
tions on Programming Languages and Systems, 27(4):662–731, July 2005.
[52] P. Haller and M. Odersky. Event-based programming without inversion of control.
In 7th Joint Modular Languages Conference (JMCL), volume LNCS 4228, pages
4–22. Springer-Verlag Berlin Heidelberg, 2006.
[53] H. R. Halstead. Multilisp: A language for concurrent symoblic computation.
ACM Transactions on Programming Languages and Systems (TOPLAS), 7:501–
538, 1985.
[54] Jonathan Harris, John A. Bircsak, M. Regina Bolduc, Jill Ann Diewald, Israel
Gale, Neil W. Johnson, Shin Lee, C. Alexander Nelson, and Carl D. Offner.
Compiling high performance fortran for distributed-memory systems. Digital
Tech. J., 7(3):5–23, January 1995.
[55] C. Hewitt, P. Bishop, and R. Steiger. A universal modular actor formalism for
artifical intelligence. In 3rd International Joint Conference on Artificial Intelli-
gence, pages 235–245, 1973.
[56] C. A. R. Hoare. An axiomatic basis for computer programming. Communications
of the ACM, 12(10):576–580, 1969.
[57] John Hogg. Islands: Aliasing protection in object-oriented languages. In OOP-
SLA ’91: Conference proceedings on Object-oriented programming systems, lan-
guages, and applications, pages 271–285, New York, NY, USA, 1991. ACM.
BIBLIOGRAPHY 261
[58] Susan Horwitz. Preceise flow-insensitive may-alias analysis is np-hard. In ACM
Transactions on Programming Languages and Systems, volume 19, pages 1–6.
ACM, 1997.
[59] P. Hudak, J. Hughes, S. P. Jones, and P. Walder. A history of haskell: Being lazy
with class. In 3rd ACM SIGPLAN conference on the History of Programming
Languages, pages 12–1—12–55, San Deigo, CA, 2007. ACM.
[60] Atsushi Igarashi, Benjamin C. Pierce, and Philip Wadler. Featherweight java:
A minimal core calculus for java and gj. ACM Transactions on Programming
Languages and Systems, 23(3):396–450, May 2001.
[61] S. Ishtiaq and P. W. O’Hearn. Bi as an assertion language for mutable data struc-
tures. In Conference Record of POPL 2001: The 28th SIGPLAN-SIGACT Sym-
posium on Principles of Programming Languages, New York, NY, 2001. ACM.
[62] Simon Jensen, Anders Mller, and Peter Thiemann. Type analysis for javascript. In
Jens Palsberg and Zhendong Su, editors, Static Analysis, volume 5673 of Lecture
Notes in Computer Science, pages 238–255. Springer Berlin / Heidelberg, 2009.
[63] T. Jim, G. Morrisett, D. Grossman, M. W. Hicks, J. Cheney, and Y. Wang.
Cyclone: A safe dialect of c. In General Track: 2002 USENIX Annual Technical
Conference, pages 275–288, Monterey, CA, USA, 2002.
[64] S. P. Jones. Engineering Theories of Software Construction, chapter Tackling the
Awkward Squad: Monadic Input/Output Concurrency, Exceptions, and Foreign-
Language Calls in Haskell, pages 47–96. IOS Press, 2001.
[65] H. V. Jula. Asm semantics for c] 2.0. In ASM’05: Proceedings of the 12th
workshop on abstract state machines, Paris, France, 2005.
[66] J. A. Kalman. Bednarsk’s extension of light’s associativity test. Semigroup Fo-
rum, 3:275–276, 1971.
[67] A. C. Kay. The early history of Smalltalk. In 2nd ACM SIGPLAN Conference
on the History of Programming Languages, pages 69–95, Cambridge, MA, USA,
1993. ACM.
262 BIBLIOGRAPHY
[68] W. Kelly. Optimization within a Unified Transformation Framework. PhD thesis,
Faculty of Graduate Studies, University of Maryland, College Park, MD, 1996.
[69] Neel Krishnaswami and Jonathan Aldrich. Permission-based ownership: encap-
sulating state in higher-order typed languages. In PLDI ’05: Proceedings of the
2005 ACM SIGPLAN conference on Programming language design and imple-
mentation, pages 96–106, New York, NY, USA, 2005. ACM.
[70] Kazuhiro Kusano and Mitsuhisa Sato. A comparison of automatic paralleliz-
ing compiler and improvements by compiler directives. In Constantine Poly-
chronopoulos, Kazuki Fukuda, and Shinji Tomita, editors, High Performance
Computing, volume 1615 of Lecture Notes in Computer Science, pages 95–108.
Springer Berlin / Heidelberg, 1999.
[71] Peeter Laud, Tarmo Uustalu, and Varmo Vene. Type systems equivalent to
data-flow analyses for imperative languages. Theoretical Computer Science,
364(3):292–310, November 2006.
[72] Shih-Wei Liao, Amer Diwan, Robert P. Bosch, Jr., Anwar Ghuloum, and Mon-
ica S. Lam. SUIF Explorer: an interactive and interprocedural parallelizer. In
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice
of parallel programming, pages 37–48, New York, NY, USA, 1999. ACM.
[73] Yi Lu and John Potter. Protecting represeentation with effect encapsualtion. In
POPL ’06: Conference Record of the 33rd ACM SIGPLAN-SIGACT Symposium
on Principles of Programming Languages, pages 359–371, New York, NY, 2006.
ACM Press.
[74] Yi Lu, John Potter, and Jingling Xue. Validity invariants and effects. In ECOOP
2007 — Object-Oriented Programming, pages 202–226. Springer-Verlag Berlin
Heidelberg, August 2007.
[75] Dror E. Maydan, John L. Hennessy, and Monica S. Lam. Efficient and exact data
dependence analysis. SIGPLAN Not., 26(6):1–14, 1991.
[76] T. J. McCabe. A complexity measure. IEEE Transactions on Software Engineer-
ing, 2(4):3–15, 1976.
BIBLIOGRAPHY 263
[77] Message Passing Interface Forum. MPI-2: Extensions to the Message Passing
Interface. University of Tennessee, Knoxville, TN, USA, 1997.
[78] Microsoft Corporation. C] Language Specification Version 3.0. Microsoft Corpo-
ration, 2007.
[79] Microsoft Corporation. Introduction to PLINQ. http://msdn.microsoft.com/
en-us/library/dd997425.aspx, April 2010.
[80] Microsoft Corporation. .NET Framework 4 — Task Parallel Library. http:
//msdn.microsoft.com/en-us/library/dd460717.aspx, April 2010.
[81] Microsoft Corporation. Potential pitfalls with PLINQ. http://msdn.microsoft.
com/en-us/library/dd997403.aspx, April 2010.
[82] Microsoft Corporation. Samples for parallel programming with the .net frame-
work 4. http://code.msdn.microsoft.com/ParExtSamples, May 2010.
[83] A. Milanova. Static inference of universe types. In International Wokr-
shop on Aliasing, Confinement and Ownership in Object-Oriented Programming
(IWACO), 2008.
[84] Ana Milanova and Yin Liu. Practical static ownership inference. Technical Report
RPI/DCS-09-04, Rensselaer Polytechnic Institute, Sept. 2009.
[85] Ana Milanova and Yin Liu. Static ownership inference for reasoning against
concurrency errors. In International Conference on Software Engineering 2009,
pages 279–282. IEEE, 2009.
[86] Ana Milanova, Atanas Rountev, and Barbara G. Ryder. Parameterized object
sensitivity for points-to analysis for java. ACM Transactions on Software Engi-
neering and Methodology, 14(1):1–41, January 2005.
[87] Gordon E. Moore. Cramming more components onto integrated circuits. Elec-
tronics, 38(8):4–7, April 1965.
[88] Peter Muller and A. Poetzsch-Heffter. A type system for controlling representa-
tion exposure in java. Technical Report 269, Fernuniversitat Hagen, 2000.
264 BIBLIOGRAPHY
[89] Peter Muller and A. Rudich. Ownership transfer in universe types. In Object-
Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages
461–478. ACM Press, 2007.
[90] Stefan Nageli. Ownership in design patterns. Master’s thesis, Software Technol-
ogy Group, Department of Computer Science, ETH Zurich, 2006.
[91] James Noble, Jan Vitek, and John Potter. Flexible alias protection. In Proceedings
of the European Conference on Object-Oriented Programming 1998, volume LNCS
1445, pages 158–185. SpringerVerlag Berlin Heidleburg, 1998.
[92] Kirsten Nygaard and Ole-Johan Dahl. The development of the simual languages.
In Richard L. Wexelblat, editor, Proceedings of the ACM SIGPLAN History of
Programming Languages Conference, ACM Monograph Series, pages 439–493,
Los Angeles CA, 1978.
[93] M. Odersky, P. Altherr, V. Cremet, I. Dragos, G. Dubochet, B. Emir,
S. McDirmid, Stephane Micheloud, N. Mihaylov, M. Schinz, E. Stenman,
L. Spoon, and M. Zenger. An overview of the scala programming language
2nd edition. Technical Report LAMP-REPORT-2006-001, Ecole Polytechnique
Federal de Lausanne, Lausanne, Switzerland, 2006.
[94] OpenMP Architecture Review Board. OpenMP Application Programmer Inter-
face version 2.5. OpenMP Architecture Review Board, 2005.
[95] Alex Potanin. Generic Ownership — A Practical Approach to Ownership and
Confinement in OO Programming Langauges. PhD thesis, School of Engineering
and Computer Science, Victoria University of Wellington, 2007.
[96] W. Pugh. Skip lists: A probabilistic alternative to balanced trees. Communica-
tions of the ACM, 33(6):668–676, June 1990.
[97] W. Pugh. The Omega Test: A fast and practical integer programming algorithm
for dependence analysis. In 1991 ACM/IEEE Conference on Supercomputing,
pages 4–13. ACM, 1991.
BIBLIOGRAPHY 265
[98] William Pugh and David Wonnacott. Eliminating false data dependences using
the Omega Test. In PLDI ’92: Proceedings of the ACM SIGPLAN 1992 confer-
ence on Programming language design and implementation, pages 140–151, New
York, NY, USA, 1992. ACM.
[99] Mohammad Raza, Cristiano Calcagno, and Philippa Gardner. Automatic paral-
lelization with separation logic. In 18th European Symposium on Programming,
ESOP 2009, volume LNCS 5502/2009, pages 348–362, 2009.
[100] W Reid, W Kelly, and A Craik. Reasoning about data parallelism in mod-
ern object-oriented languages. Australasian Computer Science Conference 2008,
2008.
[101] J. C. Reynolds. Intuitionistic reasoning about shared mutable data structures.
In J. Davies, B. Roscoe, and J. Woodcock, editors, Millennial Perspectives in
Computer Science, pages 202–321. Palgrave, Houndsmill, Hampshire, 2000.
[102] John C. Reynolds. Separation logic: A logic for shared mutable data structures. In
Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science
(LICS’02), pages 55–74, 2002.
[103] Vijay Saraswat. Report on the Programming Language X10 - Version 2.0.3. IBM,
April 2010.
[104] J. Schafer and A. Poetzsch-Heffter. A parameterized type system for simple loose
ownership domains. Journal of Object Technology, 6(5):71–100, 2007.
[105] J. Schafer and A. Poetzsch-Heffter. Coboxes: Unifying active objects and struc-
tured heaps. Formal Methods for Open Object-Based Distributed Systems, 5051
of LNCS:201–219, 2008.
[106] J. Schafer, M. Reitz, J.-M. Gaillourdet, and A. Poetzsch-Heffter. Linking pro-
grams to architectures: An object-oriented hierarchical software model based on
boxes. The Common Component Model Example, 5153 of LNCS:238–266, 2008.
[107] Micheal L. Scott. Programming Language Pragmatics. Acadmic Press, first edi-
tion edition, 2000.
266 BIBLIOGRAPHY
[108] Matthew Smith. A Model of Effects with an Application to Ownership Types.
PhD thesis, Imperial College London, May 2007.
[109] A.C. Sodan, J. Machina, A. Deshmeh, K. Macnaughton, and B. Esbaugh. Par-
allelism via multithreaded and multicore cpus. Computer, 43(3):24–32, March
2010.
[110] Sriram Srinivasan and Alan Mycroft. Kilim: Isolation-typed actors for Java. In
European Conference on Object-Oriented Programming 2008, volume LNCS 5142,
pages 104–128. Springer-Verlag Berlin Heidelberg, 2008.
[111] B. Steensgaard. Points-to analysis in almost linear time. In ACM Symposium on
Principles of Programming Languages, pages 32–41, New York, 1996. ACM.
[112] B. Stroustrup. A history of c++: 1979-1991. In Thomas J. Bergin, Jr. and
Richard G. Gibson, Jr., editors, Proceedings of the ACM HIstory of Programming
Languages conference (HOPL-2), pages 699–769, New York, NY, USA, 1993.
ACM.
[113] Sun Microsystems. JDK 1.1.1 signing flaw. March 1997.
[114] Sun Microsystems. Jsr 14: Add generic types to the javatm programming lan-
guage. Technical report, Sun Microsystems, 2004.
[115] Sun Microsystems Research. The Fortress Language Specification Version 1.0
Beta. Sun Microsystems, Melano Park, CA, USA, 2007.
[116] Herb Sutter. A fundamental turn toward concurrency in software. Dr. Dobb’s
Journal, 30(3), March 2005.
[117] Nikhil Swamy, Micheal Hicks, Dan Morrisett, Dan Grossman, and Trevor Jim.
Safe manual memory management in cyclone. Science of Computer Programming,
62(2):122–144, October 2006.
[118] Peter Thiemann. Towards a type system for analyzing JavaScript programs. In
14th European Symposium on Programming, volume LNCS 3444, pages 408–422.
Springer-Verlag Berlin Heidelberg, 2005.
BIBLIOGRAPHY 267
[119] Laurence Tratt and Roel Wuyts. Guest editors’ introduction: Dynamically typed
languages. IEEE Software, 24:28–30, 2007.
[120] University of St. Petersburg. Parallelism dwarfs project. http://
paralleldwarfs.codeplex.com/, April 2009.
[121] P. Wegner. Concepts and paradigms of object-oriented programming. ACM
SIGPLAN OOPS Messenger, 1:7–87, 1990.
[122] Adam Welc, Suresh Jagannathan, and Antony Hosking. Safe futures for Java.
In OOPSLA ’05: Proceedings of the 20th annual ACM SIGPLAN conference on
Object-oriented programming, systems, languages, and applications, pages 439–
453, New York, NY, USA, 2005. ACM.
[123] Darren Willis, David J. Pearce, and James Noble. Caching and incrementalisation
for the java query language. In Proceedings of the ACM Conference on Object-
Oriented Programming Systems, Languages & Applications, pages 1–17. ACM
Press, 2008.
[124] N. Wirth. Type extensions. ACM Transactions on Programming Languages and
Systems, 10(2):204–214, 1988.
[125] A. Woß, M. Loberbauer, and H. Mossenbock. LL(1) conflict resolution in a
recursive descent compiler generator. In Joint Modular Languages Conference
(JMLC’03), Klagenfurt, 2003.
[126] Andrew K. Wright and Matthias Felleisen. A syntactic approach to type sound-
ness. Technical Report TR91-160, Rice University, Huston, TX 77251-1892, June
1992.
[127] T. Wrigstad. Ownership-Based Alias Management. Phd, Royal Institute of Tech-
nology Stockholm, 2006.
[128] Tobias Wrigstad and Dave Clarke. Existential owners for ownership types. Jour-
nal of Object Technology, 6(4):141–159, May-June 2007.
[129] Yoav Zibin. Featherweight Ownership and Immutability Generic Java (FOIGJ).
Technical Report ECSTR10-05, School of Engineering and Computer Science,
Victoria University of Wellington, March 2010.
Top Related