A Framework for Reasoning about Inherent Parallelism in...

A Framework for Reasoning aboutInherent Parallelism in Modern

Object-Oriented Languages

Andrew CraikBASc Comp Eng Distinction

University of WaterlooJune 2007

A thesis submitted in partial fulfilment of the requirements for the degree ofDoctor of Philosophy

February 2011

Principal Supervisor: Dr. Wayne KellyAssociate Supervisor: Professor Paul Roe

*Discipline of Computer Science

Faculty of Science and TechnologyQueensland Univesity of TechnologyBrisbane, Queensland, AUSTRALIA

c© Copyright by Andrew Craik 2011. All Rights Reserved.

The author hereby grants permission to the Queensland University of Technology toreproduce and redistribute publicly paper and electronic copies of this thesis document in

whole or in part.

Keywords

programming languages; Ownership Types; parallelization; inherent parallelism; con-

ditional parallelism; effect system.

iii

Abstract

With the emergence of multi-core processors into the mainstream, parallel programming

is no longer the specialized domain it once was. There is a growing need for systems

to allow programmers to more easily reason about data dependencies and inherent

parallelism in general purpose programs. Many of these programs are written in popular

imperative programming languages like Java and C].

In this thesis I present a system for reasoning about side-effects of evaluation in an

abstract and composable manner that is suitable for use by both programmers and

automated tools such as compilers. The goal of developing such a system is to both

facilitate the automatic exploitation of the inherent parallelism present in imperative

programs and to allow programmers to reason about dependencies which may be lim-

iting the parallelism available for exploitation in their applications. Previous work on

languages and type systems for parallel computing has tended to focus on providing

the programmer with tools to facilitate the manual parallelization of programs; pro-

grammers must decide when and where it is safe to employ parallelism without the

assistance of the compiler or other automated tools. None of the existing systems

combine abstraction and composition with parallelization and correctness checking to

produce a framework which helps both programmers and automated tools to reason

about inherent parallelism.

In this work I present a system for abstractly reasoning about side-effects and data

dependencies in modern, imperative, object-oriented languages using a type and effect

system based on ideas from Ownership Types. I have developed sufficient conditions

for the safe, automated detection and exploitation of a number task, data and loop

parallelism patterns in terms of ownership relationships.

v

To validate my work, I have applied my ideas to the C] version 3.0 language to produce

a language extension called Zal. I have implemented a compiler for the Zal language

as an extension of the GPC] research compiler as a proof of concept of my system.

I have used it to parallelize a number of real-world applications to demonstrate the

feasibility of my proposed approach. In addition to this empirical validation, I present

an argument for the correctness of the type system and language semantics I have

proposed as well as sketches of proofs for the correctness of the sufficient conditions for

parallelization proposed.

vi

Contents

1 Introduction 1

1.1 Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Explicit vs. Inherent Parallelism . . . . . . . . . . . . . . . . . . 3

1.1.2 Developing Parallel Programs . . . . . . . . . . . . . . . . . . . . 3

1.1.3 Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2.1 Traditional Data Dependency Analysis . . . . . . . . . . . . . . . 7

1.2.1.1 Scalar and Local Dependency Analysis . . . . . . . . . 7

1.2.1.2 Pointer/Reference May-Alias Analysis . . . . . . . . . . 8

1.2.1.3 Array Data Dependency Analysis . . . . . . . . . . . . 8

1.2.1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3 Type Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.4 My Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.4.1 Object-orientation . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.4.2 Ownership Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 Reasoning About Parallelism 21

2.1 Side-effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.1.1 Abstracting Effects . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.1.2 Effect System Details . . . . . . . . . . . . . . . . . . . . . . . . 25

2.1.3 Effect System Complications due to Object-Orientation . . . . . 27

2.2 Detecting Data Dependencies . . . . . . . . . . . . . . . . . . . . . . . . 28

vii

2.3 Task Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.3.1 Sufficient Conditions . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.4 Loop Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.4.1 Data Parallel Loops . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.4.1.1 Foreach Loops . . . . . . . . . . . . . . . . . . . . . . . 32

2.4.1.2 Enhancing the Foreach Loop . . . . . . . . . . . . . . . 35

2.4.1.3 The Enhanced Foreach Loop . . . . . . . . . . . . . . . 36

2.4.1.4 Loop Rewriting . . . . . . . . . . . . . . . . . . . . . . 37

2.4.2 Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.4.2.1 Foreach Loops . . . . . . . . . . . . . . . . . . . . . . . 38

2.4.2.2 Enhanced Foreach Loops . . . . . . . . . . . . . . . . . 41

2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3 Realization of the Abstract System 43

3.1 Encapsulation Enforcement . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.2 Ownership Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.2.1 Generics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.2.2 Subtyping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.2.3 Type Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.3 Side-effects Using Contexts . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.3.1 Heap Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.3.2 Stack Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.4 Effect Disjointness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.4.1 Heap Effect Disjointness . . . . . . . . . . . . . . . . . . . . . . . 58

3.4.2 Facilitating Upwards Data Access . . . . . . . . . . . . . . . . . 59

3.4.3 Context Constraints . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.4.4 Runtime Ownership Tracking . . . . . . . . . . . . . . . . . . . . 62

3.4.5 Stack Effect Disjointness . . . . . . . . . . . . . . . . . . . . . . . 64

3.5 Realizing the Sufficient Conditions for Parallelism . . . . . . . . . . . . . 65

3.5.1 Task Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.5.2 Data Parallel Loops . . . . . . . . . . . . . . . . . . . . . . . . . 66

viii

3.5.3 Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4 Application to an Existing Language 69

4.1 Language Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.2 Choice of Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.3 Syntactic Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.3.1 Basic Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.3.1.1 Class-Level Context Parameters . . . . . . . . . . . . . 75

4.3.1.2 Method-Level Context Parameters . . . . . . . . . . . . 76

4.3.1.3 Context Constraints . . . . . . . . . . . . . . . . . . . . 78

4.3.1.4 Method Effect Declarations . . . . . . . . . . . . . . . . 79

4.3.2 Subroutine Constructs . . . . . . . . . . . . . . . . . . . . . . . . 81

4.3.2.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.3.2.2 Indexers . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.3.2.3 Delegates . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.3.2.4 Events . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.3.2.5 Anonymous Methods . . . . . . . . . . . . . . . . . . . 89

4.3.2.6 Lambda Expressions . . . . . . . . . . . . . . . . . . . . 93

4.3.2.7 Extension Methods . . . . . . . . . . . . . . . . . . . . 95

4.3.3 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4.3.3.1 Ref and Out Call Parameters . . . . . . . . . . . . . . . 96

4.3.3.2 Partial Types . . . . . . . . . . . . . . . . . . . . . . . . 97

4.3.3.3 Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4.3.3.4 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

4.3.3.5 User Defined Value Types . . . . . . . . . . . . . . . . . 99

4.3.3.6 Static Classes . . . . . . . . . . . . . . . . . . . . . . . 101

4.3.3.7 Nullable Types . . . . . . . . . . . . . . . . . . . . . . . 102

4.3.3.8 Existing Types . . . . . . . . . . . . . . . . . . . . . . . 103

4.4 Statics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

4.4.1 Static Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

ix

4.4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

4.5 LINQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5 Formalization 111

5.1 Type System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

5.1.1 Type Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

5.1.2 Effect Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

5.2 Proof of Ancestor Tree Search Algorithm . . . . . . . . . . . . . . . . . 116

5.3 Proof of Condition Correctness . . . . . . . . . . . . . . . . . . . . . . . 119

5.3.1 Task Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

5.3.2 Data Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

5.3.3 Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

6 Implementation 131

6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

6.2 Implementation Attribution . . . . . . . . . . . . . . . . . . . . . . . . . 132

6.3 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

6.3.1 Scanner & Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6.3.2 Abstract Syntax Tree . . . . . . . . . . . . . . . . . . . . . . . . 137

6.3.3 Design of the Pluggable Type System . . . . . . . . . . . . . . . 139

6.3.3.1 Generic Types . . . . . . . . . . . . . . . . . . . . . . . 140

6.3.3.2 Extracting an Abstract Type Parameter Infrastructure 141

6.3.4 Effect Calculation and Validation . . . . . . . . . . . . . . . . . . 146

6.3.4.1 Heap Effects . . . . . . . . . . . . . . . . . . . . . . . . 146

6.3.4.2 Stack Effects . . . . . . . . . . . . . . . . . . . . . . . . 148

6.3.4.3 Loop Body Rewriting . . . . . . . . . . . . . . . . . . . 151

6.4 Zal Code Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

6.4.1 Runtime Ownership Implementation and Tracking . . . . . . . . 155

6.4.1.1 Ownership Implementation . . . . . . . . . . . . . . . . 155

6.4.1.2 Properties & Indexers . . . . . . . . . . . . . . . . . . . 162

x

6.4.1.3 Sub-contexts . . . . . . . . . . . . . . . . . . . . . . . . 164

6.4.1.4 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

6.4.1.5 Statics . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

6.4.2 Enhanced Foreach Loop . . . . . . . . . . . . . . . . . . . . . . . 171

6.5 Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

6.5.1 Context Relationship Testing . . . . . . . . . . . . . . . . . . . . 174

6.5.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

6.5.2.1 Data Parallelism . . . . . . . . . . . . . . . . . . . . . . 178

6.5.2.2 Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . 179

6.5.2.3 Task Parallelism . . . . . . . . . . . . . . . . . . . . . . 181

6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

7 Validation 183

7.1 Test Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

7.2 The Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

7.2.1 Ray Tracer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

7.2.2 Calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

7.2.3 Bank Transaction System . . . . . . . . . . . . . . . . . . . . . . 188

7.2.4 Spectral Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

7.3 The Annotation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

7.4 The Results of Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . 194

7.4.1 Ray Tracer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

7.4.2 Calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196



7.4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

7.5 The Results of Compilation . . . . . . . . . . . . . . . . . . . . . . . . . 199

7.5.1 Ray Tracer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

7.5.2 Calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200



xi

7.6 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

7.6.1 Runtime Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 204

7.6.2 Memory Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 205

7.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

8 Comparison with Related Work 213

8.1 Background to Type Systems and Data Flow Analysis . . . . . . . . . . 214

8.2 Traditional Data Dependency Analysis . . . . . . . . . . . . . . . . . . . 217

8.2.1 Array Dependence Analysis . . . . . . . . . . . . . . . . . . . . . 217

8.2.2 May-Alias Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 218

8.3 Automatically Parallelizing Compilers . . . . . . . . . . . . . . . . . . . 219

8.4 Type and Effect Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

8.4.1 FX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

8.4.2 Ownership Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

8.4.2.1 Ownership Side-Effects . . . . . . . . . . . . . . . . . . 223

8.4.2.2 Applications to Parallelism . . . . . . . . . . . . . . . . 224

8.4.2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 225

8.4.3 Universe Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

8.4.3.1 Applications to Parallelism . . . . . . . . . . . . . . . . 226

8.4.4 Ownership Domains . . . . . . . . . . . . . . . . . . . . . . . . . 226

8.4.4.1 Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

8.4.5 Boxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

8.4.6 Uniqueness, Read-Only References and Immutability . . . . . . . 228

8.4.7 SafeJava . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

8.4.8 Deterministic Parallel Java . . . . . . . . . . . . . . . . . . . . . 229

8.5 Logics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

8.5.1 Hoare & Separation Logic . . . . . . . . . . . . . . . . . . . . . . 231

8.6 Programming Languages for Parallelism . . . . . . . . . . . . . . . . . . 232

8.6.1 Haskell & other Functional Languages . . . . . . . . . . . . . . . 232

8.6.2 Cyclone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

8.6.3 Scala . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

xii

8.6.4 High Productivity Computing Languages . . . . . . . . . . . . . 236

8.6.5 Spec# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

8.7 Alternative Concurrency Abstractions . . . . . . . . . . . . . . . . . . . 238

8.7.1 Futures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

8.7.1.1 Task Parallel Library & Parallel LINQ . . . . . . . . . 239

8.7.1.2 OpenMP . . . . . . . . . . . . . . . . . . . . . . . . . . 239

8.7.2 Message Passing . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

8.7.2.1 Actor Model . . . . . . . . . . . . . . . . . . . . . . . . 240

8.7.2.2 MPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

8.7.2.3 Jade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

8.8 Object-Oriented Paradigm Considerations . . . . . . . . . . . . . . . . . 242

8.8.1 Design Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

8.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

9 Conclusion & Future Work 245

9.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

9.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246

9.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

9.3.1 Memory Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 248

9.3.2 Annotation Overhead . . . . . . . . . . . . . . . . . . . . . . . . 248

9.3.2.1 Ownership Inference . . . . . . . . . . . . . . . . . . . . 248

9.3.2.2 Ownership Transfer . . . . . . . . . . . . . . . . . . . . 248

9.3.2.3 Temporary Owners for Transient Objects . . . . . . . . 249

9.3.3 Loop Parallelism Limitations . . . . . . . . . . . . . . . . . . . . 250

9.3.3.1 Improved Handling of Collections . . . . . . . . . . . . 250

9.3.3.2 Light’s Associativity Test . . . . . . . . . . . . . . . . . 250

9.3.4 Language Limitations . . . . . . . . . . . . . . . . . . . . . . . . 252

9.3.4.1 Liberalization of the Stack Model . . . . . . . . . . . . 252

9.3.4.2 Handling Unsafe Code Blocks . . . . . . . . . . . . . . 252

9.3.4.3 Multiple Ownerships to Model Communication Channels253

9.4 Summation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

xiii

Bibliography 255

xiv

List of Figures

1.1 A diagram showing the relationships between algorithms, programs, andparallelism. The process of parallelization is the process of making im-plicit parallelism explicit so that it can be exploited. . . . . . . . . . . . 4

1.2 Abstract memory regions which can be named as effects. The areas ofoverlap between regions indicate possible data dependencies. . . . . . . . 13

1.3 Memory regions based on an object-oriented program’s representationhierarchy. The white circles are objects and the triangles are the areasof memory which form part of their representation. The inclusion of onetriangle within another indicates that a data dependency could exist ifthey were named as effects on different operations. . . . . . . . . . . . . 15

2.1 Memory regions based on an object-oriented program’s representationhierarchy. The white circles are objects and the triangles are the areasof memory which form part of their representations. The inclusion of onetriangle within another is the basis of effect abstraction and dependencydetection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.2 The overlapping of the execution of loop iterations by using S A throughS D as pipeline stages. Notice how the iterations ripple through thedifferent stages as time progresses. . . . . . . . . . . . . . . . . . . . . . 39

2.3 Graph to help visualize dependencies permitted between specific exe-cutions of pipeline stages. S A through S D represent pipeline stagesordered from left to right and loop iterations are arranged from top tobottom. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.1 ownership relationships between contexts at runtime used as an exampleof capturing context disjointness using sub-contexts. . . . . . . . . . . . 59

4.1 The different components that are used to compile a Zal program andexploit its inherent parallelism at runtime. . . . . . . . . . . . . . . . . . 70

xv

5.1 The structure of the proof of sufficient conditions for parallelism correct-ness presented in Chapter 5. The proofs of the items highlighted in redare cited in the literature rather than re-derived. . . . . . . . . . . . . . 112

5.2 The helper function, ancestors used to test for context disjointness. Notethat Γ represents the type checking environment and Γ(l1) obtains theparent of context l1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

5.3 The base case for the ancestor algorithm inductive proof: a and b arethe same node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

5.4 The induction step of the ancestor algorithm proof: b is a parent of thenth parent of a. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

5.5 The relationship between contexts c1, c2, and object x. . . . . . . . . . . 124

5.6 The relationships between e1, e2, and x as used in the proof of effectdisjointness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

5.7 The relationships of e1, e2, r, and x and the disjointness of k1 and r forthe proof of effect disjointness. . . . . . . . . . . . . . . . . . . . . . . . 127

6.1 A diagram showing the different parts of the compiler we have written.The Zal only operations are shown in white boxes; these steps are skippedby the normal C] compiler. . . . . . . . . . . . . . . . . . . . . . . . . . 134

6.2 An illustration of the Zal compiler source directory structure where classimplementation is split across source files using partial classes stored insubdirectories for each stage of compilation. . . . . . . . . . . . . . . . . 138

6.3 This figure shows the AST subtree generated by our C] compiler for theclass declaration shown in Listing 6.5. . . . . . . . . . . . . . . . . . . . 142

6.4 This figure shows the AST subtree generated by our C] compiler forthe class declaration shown in Listing 6.5 after the amalgamated typeparameter wrappers have been added. . . . . . . . . . . . . . . . . . . . 145

6.5 The effect rule for a statement block. . . . . . . . . . . . . . . . . . . . . 148

7.1 The scene rendered by the ray tracer from Microsoft’s Samples for Par-allel Computing with the .NET Framework 4; note the reflective surfaceswhich increase the rendering complexity. . . . . . . . . . . . . . . . . . . 186

7.2 Speed-up graph for the ray tracer example. . . . . . . . . . . . . . . . . 207

7.3 Speed-up graph for the calculator example. . . . . . . . . . . . . . . . . 207

7.4 Speed-up graph for the bank transaction processing system. . . . . . . . 208

7.5 Speed-up graph for the spectral methods example. . . . . . . . . . . . . 208

xvi

7.6 Graph showing the overhead of the two possible runtime ownership track-ing systems when applied to the ray tracer example. . . . . . . . . . . . 209

7.7 Graph showing the speedup in the calculator example when the runtimeownership tracking systems are enabled. . . . . . . . . . . . . . . . . . . 209

7.8 Graph showing the overhead of the two possible runtime ownership track-ing systems when applied to the simplified bank transaction system. . . 210

7.9 Graph showing the overhead of the two possible runtime ownership track-ing systems when applied to the spectral methods example. . . . . . . . 210

7.10 Graph showing the memory overhead of the pointer and Dijkstra Viewsbased runtime ownership tracking systems. The O(n) memory usage ofthe Dijkstra Views implementation can be clearly seen as the number ofnodes increases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

7.11 Graph showing the memory overhead of the pointer and Dijkstra Viewsbased runtime ownership tracking systems. Note that the height of theownership tree does not increase with the data size in this example andso the Dijkstra Views memory consumption grows proportional to thenumber of transactions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

9.1 Illustration of two singly owned objects a and b communicating via ajointly owned context, labelled a&b. . . . . . . . . . . . . . . . . . . . . 253

xvii

List of Tables

3.1 Table showing the runtime complexity of object creation and relationshiptesting. Note that n is the height of the ownership tree. . . . . . . . . . 64

4.1 The four context relationships which can be stipulated in a Zal contextconstraint clause and their meanings. . . . . . . . . . . . . . . . . . . . . 78

6.1 Measures of the relative sizes of the Ownership Extensions and the GPC]compiler in terms of physical lines of code (SLOC-P), logical lines of code(SLOC-L), and cyclomatic complexity (McCabe VG) [76]. . . . . . . . . 133

6.2 The AST nodes added to represent the contexts, context constraints,effect declarations, and enhanced foreach loops. . . . . . . . . . . . . . 139

6.3 Descriptions of the AST node types which appear in Figure 6.3. . . . . . 141

6.4 The interfaces written to provide an abstract structure for the imple-mentation of type parameters. . . . . . . . . . . . . . . . . . . . . . . . . 143

6.5 The classes used to wrap up lists of type parameters so that differentparameter lists do not need to be aware of one another and so thatparameters can be checked and resolved collectively. . . . . . . . . . . . 144

6.6 Key methods from the interfaces used to abstract type parameters tocreate a pluggable type system. . . . . . . . . . . . . . . . . . . . . . . . 147

6.7 Table of custom attributes used to store declared context parametersand effect information. These attributes are emitted into C] source codeproduced by the Zal compiler. . . . . . . . . . . . . . . . . . . . . . . . . 160

6.8 Enhanced foreach loop body delegates based on the optional enhance-ments declared in the loop header. *note that the loop body withouteither of the optional enhancements is a traditional foreach loop and canbe handled using the IEnumerable interface as usual. . . . . . . . . . . 172

7.1 Table showing the number of logical lines of source code modified foreach of the examples during the annotation process. . . . . . . . . . . . 195

xix

7.2 Table showing the number method definitions modified for each of theexamples during the annotation process. . . . . . . . . . . . . . . . . . . 195

xx

Code Listings

1.1 A small program snippet used to illustrate the idea of implicit paral-lelism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 An example of a pair of nested loops used in the discussion of traditionalarray data dependency analysis. . . . . . . . . . . . . . . . . . . . . . . 9

1.3 A code snippet showing the Java 1.1.1 getSigners bug. . . . . . . . . 16

2.1 A simple stereotypical data parallel foreach loop. . . . . . . . . . . . . 32

2.2 A generic foreach loop. . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.3 A generic foreach loop with its body abstracted to a method on theelement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.4 The body of a generic foreach loop extracted as a method on the elementtype T. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.5 A for loop whose body updates elements of a collection in place andmakes use of the index of elements in the collection. . . . . . . . . . . . 35

2.6 An example of the syntax for an enhanced foreach loop equivalent tothe for loop in Listing 2.5. . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.7 A simple enhanced foreach loop used to develop sufficient conditionsfor parallelism which can be generalized to all enhanced foreach loops. 36

2.8 A generic foreach loop body consisting of four statements. . . . . . . . 38

3.1 A code snippet showing the field and method signature implicated in theJava 1.1.1 getSigners bug. . . . . . . . . . . . . . . . . . . . . . . . . 44

3.2 A simple stack implementation showing how I annotate classes with con-text parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.3 An example of a class with both generic type parameters and contextparameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.4 A simple generic stack implementation showing how I annotate classeswith context parameters and how they interact with generic type param-eters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.5 An example showing how a child class maps its formal context parame-ters to those of its parent. . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.6 An example showing how allowing variance in type parameters can createholes in the type system. . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.7 Algorithm for unioning two sets of effects, set1 and set2. Note that the+ and − operations are just set addition and subtraction. . . . . . . . 55

xxi

3.8 A simple stack implementation illustrating method and constructor ef-fect declarations. Method-level context parameters are specified using anotation inspired by C]’s method-level generic type parameters. . . . . 56

3.9 An example of context constraint syntax on a class with context param-eters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.10 The algorithm for testing if two contexts are disjoint. Each object hasa list of ancestor contexts which can be indexed into using []. The ||operator has its usual mathematical meaning of magnitude. . . . . . . 63

3.11 A simple stereotypical data parallel foreach loop. . . . . . . . . . . . . 66

4.1 An example of a hashtable which implements a visitor interface whichallows the values to be traversed in parallel provided the k and v contextsare disjoint. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.2 The C] implementation of the hashtable example shown in Listing 4.1. 71

4.3 An example of a class parameterized with formal context parametersowner, a, and b. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.4 An example of a field with actual context parameters this, world, and b. 76

4.5 An example of a class extending a class which is parameterized withcontext parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.6 A method definition with formal context parameters. . . . . . . . . . . 77

4.7 An example of the inference of type parameters to a generic method. . 77

4.8 An example of a class annotated with context parameters and contextconstraints using Zal’s syntax. . . . . . . . . . . . . . . . . . . . . . . . 79

4.9 An example of an instance method annotated with effect declarationsand context constraint clauses. . . . . . . . . . . . . . . . . . . . . . . . 80

4.10 An example of a property being used to read and write a field. . . . . . 81

4.11 Code snippet which shows how the properties in Listing 4.10 could beimplemented using methods. . . . . . . . . . . . . . . . . . . . . . . . . 82

4.12 An example of a property annotated with read and write effects. . . . . 83

4.13 An example of an automatic property which does not require an explicitfield or accessor implementations. . . . . . . . . . . . . . . . . . . . . . 83

4.14 An example of an indexer used to convert day names into numerical daysof the week from a defined starting point. . . . . . . . . . . . . . . . . . 84

4.15 An example of how to annotate an indexer with context parameters andeffects. The classes annotated were previously shown in Listing 4.14. . 85

4.16 The syntax of a delegate taking two Objects and returning an Object. 86

4.17 An example of the use of context parameters and effect declarations onthe delegate originally shown in Listing 4.16. . . . . . . . . . . . . . . . 86

4.18 An example of a simple C] event using the delegate type EventHandler. 88

4.19 An example of a simple C] event previously shown in Listing 4.18 nowannotated with context parameters and effect declarations. . . . . . . . 88

xxii

4.20 An example of a simple Reverse Polish Notation calculator which definesbinary options to be applied to the stack as a delegate. The calculatorsupplies two operations, add and sub, via anonymous method declara-tions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.21 An example of an ownership annotated Reverse Polish Notation calcu-lator class based on the original C] example shown in Listing 4.20. Notethe effect declarations added to the anonymous methods. . . . . . . . . 90

4.22 A code listing showing the capture of local variables from two differentscopes in the anonymous method returned from the operation method. 91

4.23 An example showing how the C] compiler would implement the exampleshown in Listing 4.22 using private inner classes. . . . . . . . . . . . . . 91

4.24 An example showing how the implementation of the example shown inListing 4.22 would be annotated with context parameters and effect dec-larations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.25 The outer variable capture example from Listing 4.24 annotated withcontext parameters and effect declarations. . . . . . . . . . . . . . . . . 93

4.26 An example code snippet showing the difference in the typing of anony-mous methods and lambda expressions. The anonymous method fails tocompile because the int i parameter cannot be implicitly converted toa short while the lambda fails to compile because of the delegate it isbound to taking an int parameter; it would succeed if the delegate tooka short as a parameter. . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.27 An example of an ownership annotated Reverse Polish Notation calcu-lator class based on the original C] example shown in Listing 4.20 withthe anonymous methods replaced by lambda expressions. . . . . . . . . 95

4.28 An example of an extension method which adds a WordCount method tothe interface of string. . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4.29 An example of an extension method annotated with context parame-ters and effect declarations showing reads of the extension parametergenerating reads of this. . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4.30 Examples of single, jagged, and multi-dimensional arrays and their an-notation with context parameters. . . . . . . . . . . . . . . . . . . . . . 99

4.31 A code fragment showing the copy on assignment behavior of a user-defined value type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

4.32 An example of a struct in the form of a two dimensional coordinatein a coordinate system. Note the struct holding a reference to theCoordinateSystem class. . . . . . . . . . . . . . . . . . . . . . . . . . . 101

4.33 The Point value type shown in Listing 4.32 annotated with contextparameters and effect declarations. . . . . . . . . . . . . . . . . . . . . 102

4.34 An example of the syntax for declaring an effectTemplate to injectcontext and effect information onto an existing type. . . . . . . . . . . 103

xxiii

4.35 An example of the syntax for declaring an effectTemplate which doesnot add formal context parameters to a type, but still injects contextand effect information onto member declarations it contains. . . . . . . 104

4.36 An example of an effectTemplate which adds effect declarations tosome of the methods of the ICollection<T>. . . . . . . . . . . . . . . 105

4.37 An example of a read of a static field of the DataStore class causing aread of the DataStore type context. . . . . . . . . . . . . . . . . . . . . 106

4.38 The read of Child.value actually reads the value field declared on theParent class and so results in a read of the Parent type context. . . . . 106

4.39 An example of a generic type parameter being used in the declaration ofa static. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

4.40 An example of a static field whose type is constructed using class contextparameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

4.41 An example of a simple LINQ query . . . . . . . . . . . . . . . . . . . . 108

4.42 The reduction of the simple LINQ query example shown in Listing 4.41to a series of chained method invocations. . . . . . . . . . . . . . . . . 109

5.1 The simple stereotypical data parallel foreach loop for which sufficientconditions for parallelization were developed. . . . . . . . . . . . . . . . 124

5.2 An example of the style of loop intended for pipelining . . . . . . . . . 128

6.1 COCO/R grammar production for a list of actual context parameters inZal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

6.2 Coco/R grammar production for a list of formal context parameters inZal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

6.3 Coco/R grammar production for a set of declared read and write effects. 137

6.4 The modified Coco/R production for the enhanced foreach loop in Zal. 138

6.5 An example of the declaration and use of generic types in C]. . . . . . 140

6.6 The key interfaces used to abstract the different kinds of type parameterlists and the methods declared on them. . . . . . . . . . . . . . . . . . 147

6.7 The method for computing the heap effect of a statement block. . . . . 148

6.8 The computEffects method of the NameExpression AST node. . . . . 149

6.9 The implementations of the local effect computation methods LocalEffectson member and name expressions as well as blocks statements. . . . . . 150

6.10 An example of the mapping of struct this contexts to stack variablesduring stack effect computation . . . . . . . . . . . . . . . . . . . . . . 151

6.11 The IOwnership interface implemented by all types emitted by the Zalcompiler when Dijkstra Views based tracking is selected. . . . . . . . . 157

6.12 The IOwnership interface implemented by all types emitted by the Zalcompiler when Dijkstra Views based tracking is selected. . . . . . . . . 157

6.13 Extension methods on object which allow ownership properties to beread from any object and set on any object which supports it as im-plemented for the parent pointer version of the IOwnership interface.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

xxiv

6.14 The items in the OwnershipHelpers class which are used to facilitatethe implementation of ownership tracking. . . . . . . . . . . . . . . . . 160

6.15 The implementation of a Zal class, with context formal context parame-ters and declared constructor effects, in C] using custom attributes andthe OwnershipHelpers library of helper methods. . . . . . . . . . . . . 161

6.16 The implementation of a Zal method with declared formal context pa-rameters and effects, in C] using custom attributes and OwnershipHelpers.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

6.17 An example of the implementation of a Zal property in C]. . . . . . . . 163

6.18 The implementation of the SubContext class which is used to representsub-contexts declared in type definitions. . . . . . . . . . . . . . . . . . 164

6.19 An example of the use of sub-contexts as part of the implementation ofa binary tree node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

6.20 The static ArrayOwnershipExtensions class which stores ownership in-formation for arrays and provides access to this information via extensionmethods on the System.Array which provide the same functionality asrequired of types which implement the IOwnership interface as imple-mented for the parent pointer based system. . . . . . . . . . . . . . . . 166

6.21 An example of the creation of an array object in Zal and how the own-ership of the array is set. Notice that the standard AddChild method iscalled to set the ownership of the array. The add child method calls theObject SetParents method which in turn calls the SetParents exten-sion method on the System.Array type. . . . . . . . . . . . . . . . . . 167

6.22 The implementation of the ContextMap data structure used to implementstatic fields in classes parameterized by context parameters. . . . . . . 169

6.23 An example of the implementation of a Zal static field in C]. Noticethat because the static field could be accessed from outside the class theplain field is retained for backwards compatibility, but that getter andsetter methods are supplied for context aware code. . . . . . . . . . . . 170

6.24 An example of how a static method is implemented in C] when thecontaining type has formal context parameters. . . . . . . . . . . . . . 170

6.25 An example of the implementation of a Zal static property in C]. Theoriginal property can be optionally retained for use by existing C] pro-grams, but is omitted for clarity from the listing above. The get and setmethods are used by ownership aware code to marshall context param-eters to the accessor implementations. . . . . . . . . . . . . . . . . . . . 171

6.26 Sample implementations of EnhancedLoop for the IList and IDictionary

interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

6.27 The two interfaces supplied with the enhanced foreach loop librarywhich collections can implement so they can be used with the enhancedforeach loop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

xxv

6.28 Sample implementations of the EnhancedLoop method for the librarysupplied IEnhancedEnumerable and the ParallelEnhancedLoop methodfor IIndexedEnumerable. The ParallelEnhancedLoop makes use of theMicrosoft Task Parallel Library (TPL) parallel foreach loop implemen-tation (see Section 6.5.2.1). . . . . . . . . . . . . . . . . . . . . . . . . . 175

6.29 The ConstraintList interface showing context relationship addition,testing, and runtime test generation methods. . . . . . . . . . . . . . . 176

6.30 The extension methods in the runtime ownership library used to test therelationships between arbitrary contexts. . . . . . . . . . . . . . . . . . 177

6.31 foreach loop parallelization using the TPL’s Parallel.ForEach method.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

6.32 An example of how a parallel enhanced foreach loop would be imple-mented. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

6.33 Conditional foreach loop parallelization using the TPL’s Parallel.ForEachmethod. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

6.34 An example of using the pipelining library to create a pipeline; eachstage only modifies the ImageSlice and the representation of the filteror detector in that stage, if any. . . . . . . . . . . . . . . . . . . . . . . 180

6.35 The implementation of task parallelism using a TPL Task. . . . . . . . 181

7.1 The original C] Render method with its doubly-nested for loop. . . . . 186

7.2 The C] implementations of the calculator AST Node class and BinaryOperator

class. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

7.3 The original C] implementation of the bank transaction system’s Transactionand the Bank’s transaction processing method. . . . . . . . . . . . . . . 189

7.4 A fragment of the sequential C] Spectral Method’s key data structuresand computational methods. . . . . . . . . . . . . . . . . . . . . . . . . 190

7.5 The original sample code used as a running example to show the opera-tion of my proposed ownership annotation algorithm heuristic. . . . . . 191

7.6 The first step of annotating the Result class, adding the owner. . . . . 192

7.7 The completion of the Result class’s annotation. . . . . . . . . . . . . 192

7.8 The first step of annotating the ResultWrapper class. . . . . . . . . . . 193

7.9 The final result of annotating the code shown in Listing 7.5. . . . . . . 193

7.10 The Zal implementation of the original Render method shown in List-ing 7.1; note the loop has been rewritten as an enhanced foreach loop.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

7.11 The Zal implementation of the calculator AST Node class and BinaryOperator

class. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

7.12 The Zal implementation of the Bank’s transaction processing method. . 198

7.13 The annotated version of the Spectral Methods example. . . . . . . . . 199

7.14 The C] implementation of the Zal Render method shown in Listing 7.1. 200

7.15 The implementation of part of the Zal calculator example in C]; notethe task parallelism in the Compute method. . . . . . . . . . . . . . . . 201

xxvi

7.16 The implementation of the Zal transaction processing method in C]; notethe pipelined foreach loop. . . . . . . . . . . . . . . . . . . . . . . . . 202

7.17 The compiler output for the Zal implementation of the Spectral Methodsexample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

xxvii

xxviii

Statement of Original Authorship

The work contained in this thesis has not been previously submitted for a

degree or diploma at any other higher education institution. To the best of

my knowledge and belief, the thesis contains no material previously published

or written by another person except where due reference is made.

Signature:Andrew Craik

Date:

xxix

Acknowledgements

A PhD and the thesis written to obtain it is often viewed as an individual achieve-

ment; however, this achievement would not be possible without a host of supporting

personalities and I take this opportunity to thank them all.

Firstly, I would like to thank my parents for their unconditional love and support

and for engendering in me a love of science and learning which drove me to pursue

my education and question why and how the world works. Thank you also to my

grandparents. While not all of them lived to see me complete this work their love

and support across the miles all these years has always been a great encouragement.

Lastly, I would like to thank my lovely Rachel who only joined me part way through

my PhD journey, but who put up with my unsociable lab hours and crazed ramblings.

My candidacy would not have been half as fun, memorable, or enjoyable without her.

No student new to the art of research can hope to learn the craft without a skilled

mentor and I must thank my principal supervisor Dr. Wayne Kelly enormously for his

skilled guidance during my candidacy. I would also like to thank Prof. John Gough,

who together with Dr. Wayne Kelly, helped to write the C] compiler I modified and

extended as part of my thesis. This work would not have been possible without their

assistance. Lastly, I would like to thank Prof. Paul Roe, my associate supervisor, for

our many and varied discussions.

During September 2009 I had the pleasure of visiting the Victoria University of Welling-

ton to present my research and to obtain advice on some of the formalism required to

round-out the presentation of my ideas. I must thank all the members of VUW with

whom I interacted for a most productive and thoroughly enjoyable stay. I would espe-

cially like to single out and thank Dr. Alex Potanin, Prof. James Noble, Dr. David

xxxi

Pearce, and Dr. Nicholas Cameron for making me so welcome and taking the time to

share their knowledge and expertise with me.

I would also like to thank my lab mates for their comradeship and encouragement as

we struggled through our respective research journeys. While many people have passed

through the lab during my candidature I would especially like to thank Richard Mason,

Darryl Cain, Jiro Sumitomo, Matthew Brecknell, Lawrence Buckingham, Peter Ansell,

Wayne Reid, and Donna Teague.

Lastly, I would like to thank all of my friends at St John Ambulance QLD. My in-

volvement with First Aid Services has provided a wonderful balance to my life during

my candidature; I could not have been made more welcome. I would especially like

to thank the leaders, past and present, of Brisbane Central No. 2 Division includ-

ing Chris, Damien, Doris, and Clayton for creating such a wonderful environment and

understanding when my candidature had to come first.

xxxii

Chapter 1

Introduction

“Writing correct and efficient parallel programs is a major challenge that

calls for better tools and more abstract programming models to make thread

programming safer and more convenient” — Sodan et al. [109]

Parallel computing has long been a field of active research in computer science. Since

the earliest days of modern computing in World War II at Blechley Park, multiple

processing units have been used to concurrently execute parts of a program.

Traditionally, parallel computing was primarily of interest to those working in highly

specialized domains, such as the field of high performance computing, where the em-

phasis is on maximizing the throughput of carefully written computationally bound

numerical applications. Apart from these specialized programs, the vast majority of

general purpose programs were written and executed in a sequential manner; most com-

puters had only one processor and so there was no benefit to parallelizing programs.

Each new generation of silicon manufacturing technology shrinks the size of a transis-

tor which allows more transistors to be packed onto a chip of the same size. This fact

is usually stated in the form of Moore’s Law which states that the number of tran-

sistors that can be placed on an integrated circuit chip doubles approximately every

two years[87]. These additional transistors were, traditionally, used by chip designers

and manufacturers to increase the instruction throughput of the single computational

core found on most chips. This meant that with every new hardware generation, all

applications received a free performance boost.

1

2 Chapter 1. Introduction

These single core performance improvements were achieved through a combination of

higher clock speeds, larger caches, and the implementation of hardware optimizations

[116]. By the early 2000s, the limits of these optimization techniques were being reached

due to a number of physical issues: chip heat output began to outstrip the ability to

dissipate it, chip power demand began to exceed what could be easily supplied, and

increased leakage current and other parasitics reduced efficiency [116]. The net result of

this is that computational cores are no longer getting faster. To continue increasing the

instruction throughput of chips, manufacturers began to use increases in the number

of transistors per chip to implement additional computational cores on the same chip.

For applications to fully exploit additional computational cores there will need to be

changes in the design and implementation of software. Computer corse are now growing

in number rather than in speed as has previously been the case. Performance gains can

only be realized on newer hardware when the program to be executed can be broken

into chunks for execution on multiple cores concurrently.

The historical focus on sequential program execution has produced tools and program-

ming practices which are not easily amenable to parallelization. The race is now on

to find ways to make parallelism accessible to all programmers for everyday general

purpose computing despite this sequential legacy.

1.1 Parallelism

There is a clearly articulated need to allow more programmers, who do not have spe-

cialist parallel computing training, to write programs in mainstream programming lan-

guages, like Java and C], which are able to exploit the latest multi-core processors.

Unfortunately, mainstream programming languages, like Java and C], default to using

sequential semantics and so programmers tend to employ sequential problem solving

tactics and write sequential programs.

The amount of parallelism in programs written using these mainstream languages is

primarily dictated by the algorithms they employ. Algorithms can be broadly classified

into two groups: explicitly parallel and sequential. Sequential algorithms often still

contain some exploitable parallelism in the form of inherent parallelism.

Chapter 1. Introduction 3

1.1.1 Explicit vs. Inherent Parallelism

An explicitly parallel program or algorithm is one that employs parallel constructs

to explicitly specify which parts of the code should be executed concurrently. There

are many parallel programming languages and APIs that allow such parallelism to

be expressed. An implicitly parallel program or algorithm, by contrast, is one where

parallelism can be discovered through program analysis. Through analysis, the parts

of an inherently parallel program that could be executed in parallel without changing

the semantics of the program can be discovered.

1.1.2 Developing Parallel Programs

There are two broad classifications of the approaches to developing parallel programs.

In the some cases, it is natural to conceptualize the problem being solved in an explicitly

parallel form. In others, it is more convenient to first develop a working sequential

program. Once the sequential version of the program is fully tested and debugged, it

is converted into a parallel form. The process of transforming a working sequential

program into an equivalent parallel program is referred to as parallelization.

When parallelizing a program it is necessary to ensure that the semantics of the original

are not changed during the transformation — the parallel program should still produce

exactly the same results as the sequential program, it should just do so faster. One way

of ensuring that a transformation preserves the semantics of the original program it to

ensure that Bernstein’s Conditions for parallelism are satisfied. Bernstein’s Conditions

[15], as originally defined, states that two blocks of code, S1 and S2, can be safely

executed in parallel provided that:

• IN(S1) ∩OUT (S2) = ∅

• OUT (S1) ∩ IN(S2) = ∅

• OUT (S1) ∩OUT (S2) = ∅

where IN(S) is the set of memory locations used by S and OUT (S) is the set of memory

locations written to by S[15]. In this thesis I will rely on Bernstein’s Conditions to prove


that the parallelization performed by my system preserves the semantics of the original

program.

1.1.3 Parallelization

Figure 1.1 shows the relationships between sequential and parallel algorithms and im-

plementations. When parallelizing an application, the algorithms used in some parts of

the program may need to be replaced with alternative algorithms that produce the same

net result while being more amenable to parallelization. Such algorithm transforma-

tion cannot generally be performed automatically using tools; algorithm replacement

requires a detailed understanding of program semantics which requires human intellect

and analysis.

Figure 1.1: A diagram showing the relationships between algorithms, programs, andparallelism. The process of parallelization is the process of making implicit parallelismexplicit so that it can be exploited.

For other parts of the program, the algorithms employed may not need to be funda-

mentally changed. Instead, inherent parallelism contained in the original algorithm can

be exploited. Consider the trivial program fragment shown in Listing 1.1.

In this fragment the assignment to z requires the values of x and y to be set. Steps

which set the values of x and y do not depend on one another and so can be executed

in parallel provided that the assignments are completed before computing the value for


int x = 1;int y = 2;int z = x + y;

Listing 1.1: A small program snippet used to illustrate the idea of implicit parallelism.

z. This parallelism is called inherent parallelism and exists because the dependencies

between the steps of a sequential algorithm do not necessarily require all previous steps

to have been completed, just those whose results are consumed. This thesis addresses

the analysis required to accurately identify such inherent parallelism.

There is a number of different techniques which can be used to detect and safely exploit

inherent parallelism in sequential programs. Speculative execution is a major field of

research competing with program analysis techniques in the race to facilitate safe paral-

lelization of applications. In these speculative approaches, different parts of a program

are speculatively executed in parallel without static guarantees about the dependencies

between them using a transactional memory system. If conflicting data accesses occur,

the runtime memory transaction system is expected to return the program to a consis-

tent state before resuming the program’s execution. These speculative approaches have

the benefit of not requiring a detailed analysis before the benefits of parallelization can

be obtained, but this comes at the cost of the complex and resource intensive runtime

needed to ensure the safety and correctness of the program being executed.

Today, programmers can manually parallelize programs using tools like OpenMP [94]

and the Microsoft Task Parallel Library (TPL) [80] to express parallelism. If the se-

quential program is not fully optimized prior to parallelization, this manual process

may yield additional performance gains. However, for well written programs, man-

ual parallelization is a time consuming and error-prone process. It is often hard for

programmers to be certain that such parallelization is safe — that the transformed

program will produce the same results as the original program. As I shall explain

in Section 1.2.1, completely automated parallelization is beyond the current state-of-

the-art. This thesis, as previously stated, aims to develop techniques which can be

used to both identify opportunities for parallelization as well as validating the safety

of programmer supplied parallelization.

I feel the key to successfully exploiting inherent parallelism is facilitating reasoning


about parallelism by both programmers and automated tools. Sequential programs

contain some inherent parallelism which my techniques aim to exploit. Through the

use of the techniques I present in this thesis, finding the opportunities may be made

easier and opportunities may be discovered. The goal of this thesis is to combine

abstraction and composition with parallelization and correctness checking to produce

a framework which helps both programmers and automated tools to reason about

inherent parallelism; this is not done by existing systems.

1.2 Background

There are a number of different programming paradigms; each has its own strengths

and weaknesses. Different paradigms have different core principles and each has its

own classes of problems for which it is best suited. All programming paradigms make

use of a computer’s ability to store intermediate computation results and recall them

for use later in a program’s execution. Where paradigms differ is how this ability is

exposed to the programmer. In the declarative paradigm (which includes functional

and logic programming languages) this ability is not directly exposed — it is used

implicitly through each language’s semantics. The imperative paradigm, by contrast,

does directly expose this ability.

The choice of exposing a computer’s ability to store intermediate results to programmers

or not has a significant impact on the nature of a paradigm and the reasoning required

to parallelize programs written in its member languages. In those which hide this abil-

ity, like functional programming, the side-effects of executing a code block are highly

constrained; access to arbitrary shared state is prohibited and side-effects tightly con-

trolled through the type system. These constraints make reasoning about parallelism

easier. Code blocks in languages exposing access to arbitrary shared state can have

almost arbitrary side-effects which makes reasoning about possible data dependencies,

and hence parallelism, much more difficult.

The imperative paradigm is one of the most popular paradigms in use today. A large

number of heavily used commercial languages are generally classified as being part of

the imperative programming paradigm including Java, C], and Visual Basic. I have


chosen to focus this thesis on how to reason about parallelism in imperative languages

for several reasons. Firstly, imperative languages are some of the most popular and

widely used languages in the world today making parallelization of programs in these

languages especially urgent and important. Secondly, reasoning about parallelism in

other paradigms that restrict access to shared mutable state is relatively easier and

so developing techniques for imperative languages is a more interesting research goal.

Finally, a significant amount of knowledge, experience, and skill has been developed

working with imperative languages and it would be desirable to be able to retain some

or all of this while facilitating the use of parallelism.

1.2.1 Traditional Data Dependency Analysis

There are two broad categories of dependencies in programs: control dependencies

and data dependencies. Control dependencies are well studied and understood; mod-

ern programming languages have a number of structured control flow abstractions and

constraints which make reasoning about control dependencies relatively easier. Data

dependencies have also been extensively studied, but have traditionally focused on sci-

entific applications. These well studied scientific applications tend to take the form of

several tightly nested loops traversing array data structures performing complex math-

ematical computations. In this section I will present the traditional data dependency

analysis techniques with a view to identifying a gap in the current techniques and

knowledge that I aim to address in this thesis.

1.2.1.1 Scalar and Local Dependency Analysis

The traditional approach to performing a data dependency analysis is to compare,

pairwise, all of the statements in the code fragment being analyzed to determine the

nature of the dependency between the two statements, if any. The data dependency

analysis itself operates on the level of individual variables. These analysis techniques

operate on value types checking for variable name equality to detect if a dependence

can exist or not. When reference types are encountered a may-alias analysis [6, 111]

needs to be performed to determine which variables might actually be referring to the

same object at runtime.


Methods cause significant problems in these techniques because of the pairwise consid-

eration of statements. Methods abstract sequences of operations in the program and

so to compute dependencies between method invocations and other statements, there

needs to be some means of resolving what operations the method performs. In modern

imperative languages which permit overriding and dynamic binding, it may not be pos-

sible to statically determine which method implementation will be invoked. With the

use of dynamic binding, the method to be invoked may not have been implemented yet

and so a dependence analysis cannot be performed. Worse still, this technique causes

an explosion in the number of dependence analyses which must be performed as the

size the code fragment increases.

1.2.1.2 Pointer/Reference May-Alias Analysis

Computing data dependencies in programming languages which have pointers, refer-

ences, or reference types is further complicated by the fact that a single memory location

may be referred to by multiple names within a single program. To conduct data de-

pendency analysis in the presence of such aliasing, the traditional solution would be to

undertake a may-alias analysis to determine which variables in the program could refer

to the same location in memory [6, 86].

A may-alias analysis is a static analysis and so it does not have access to actual pointers

and objects. Instead of tracking objects and memory addresses, may-alias analyses try

to disambiguate variables using allocation site information [6]. An allocation site is the

method responsible for allocating an object; therefore, if two objects are created at the

same allocation site a traditional may-alias analysis would identify that the two objects

could be the same. This is an approximation of the program’s actual behavior and can

result in a number of false-positive aliasing relationships being identified [86].

1.2.1.3 Array Data Dependency Analysis

Loops are one of the major sources of parallelism in imperative programming languages.

Loops can be parallelized in several different ways, depending on the loop’s inter- and

intra-iteration data dependencies. The traditional approach to finding a loop’s inter-


and intra-iteration data dependencies is to perform a loop specific data dependency

analysis. Consider the sample nested loops shown in Listing 1.2

for (int i = 1; i < n; ++i) {for (int j = 1; j < n; ++j) {

f[i] = g[3 * i - 5] + 1;g[2 * i + 1] = i * j;

}}

Listing 1.2: An example of a pair of nested loops used in the discussion of traditionalarray data dependency analysis.

As was the case with the dependency analyses techniques already discussed, array de-

pendency analysis techniques consider all of the statements in the loops being analyzed

in a pairwise manner. So considering the example shown in Listing 1.2, we would want

to determine if g[2 * i + 1] and f[i] could be referring to the same location. The

first step would be to perform a may-alias analysis to determine if the two arrays, f

and g, could be aliases. If they could be aliases then it remains to determine if i and

2 * i + 1 could ever be equal over the range of i.

A number of different techniques have been proposed to try to answer questions about

the relationships between different array index expressions. These techniques operate

only on affine indexing expressions. One of the simplest approaches, proposed by

Banerjee, uses the Greatest Common Denominator (GCD) to determine if the two

array index expressions could be equal[11]. Banerjee requires the loop to be normalized

to iterate from 1 to terminal value incrementing by 1. Once the loop is in this form

the array index expressions are arranged into the form a * i + b and c * i + d.

Banerjee proved that if a loop carried dependence exists then GCD(c, a) must divide

(d − b).

Techniques for detecting when array index expressions could lead to dependencies

through arrays continued to evolve during the 1990s. In 1991 it was proved that solving

a system of constraint equations for array index expression to find out if they could

cause a data dependency on the array was an NP -complete problem [75]. One of the

most advanced techniques which allows precise solutions to systems of affine array in-

dex equations to detect dependencies was proposed by Pugh in the form of the Omega

Test [98, 97]. The Omega Test was ultimately developed to show that for systems


of affine equations, the data dependence analysis could be performed in a reasonable

amount of time [68].

Function invocations may also cause problems for array data dependency analysis since

it is not possible to tell what a method will do when considering statements in a pairwise

manner. Further, non-affine index expressions or unusual loop traversal patterns can

easily foil many array dependency analyses.

1.2.1.4 Summary

Traditional data dependency analyses have been shown to work well on scientific pro-

grams with small, tightly nested loops [97]. Unfortunately, the limitations of inter-

procedural analyses and may-alias analyses mean that in popular, imperative, object-

oriented languages, traditional analysis techniques do not always produce accurate

results which means there is a high rate of false-positive dependencies identified [58].

These modern languages do not provide language features to help facilitate reasoning

about data dependencies. Even in the best case of fully context sensitive data depen-

dency analysis, the allocation sites tracked are still approximations. This thesis will

contribute a new approach to the problem of performing data dependence analysis for

parallelization purposes by extending an object-oriented language with features to help

facilitate dependency and effect analysis. Unlike the traditional techniques discussed

in this section, my new approach is designed to be abstract, composable, and usable

by both programmers and automated tools. By composable, I mean to say that paral-

lelism analysis can be performed at levels of granularity below that of whole program

analysis. This means that libraries and other reusable software components can be

used in programs without having to include all of their implementation details in any

parallelism analyses undertaken.

1.3 Type Systems

The type system in a programming language is fundamental to how programmers and

automated tools, like compilers, understand and reason about programs. There is a

whole spectrum of languages whose type systems provide different features and different


guarantees of varying strengths about what a program may or may not do at runtime.

Some languages choose to try and enforce guarantees and invariants prior to program

execution while others only type check the program as execution proceeds.

Programming languages can be classified into four groups based on the strength of the

guarantees and invariants provided by the language and when programs are checked

to ensure that they do not violate guarantees and invariants provided by the language.

Languages which validate programs prior to execution are said to be statically typed

while those which defer such validation to runtime are said to be dynamically typed.

Languages which provide strong guarantees and invariants are said to be strongly typed

while those that do not are said to be weakly typed. Most languages fall in a two-

dimensional spectrum between these different criteria [107]; some are more statically

typed than others, for example.

As previously stated, it is my belief that reasoning about parallelism is the key to

making parallelism more accessible. When trying to construct a system for reasoning

about inherent parallelism it is highly desirable to have as much reliable and detailed

information about a program as possible. Statically typed languages, by their natures,

provide invariants which are amenable to validation prior to program execution. These

invariants can sometimes be used to make reasoning about parallelism easier. Because

of this, I have chosen to focus this thesis on strong statically typed languages. The

techniques I present might be adapted to work for other types of language, but this is

beyond the scope of this work.

1.4 My Approach

Computer programmers must maintain a mental image of the algorithm a program is

implementing, the data structures it is operating on, and the control flow required to

implement the algorithm and manipulate the data structures being processed. Simi-

larly, compilers must maintain a similar understanding of a program when computing

program invariants and undertaking performance optimizations on it. Programming

languages have evolved to facilitate this reasoning by providing different levels of ab-

straction. These abstraction mechanisms allow analyses to be compartmentalized and


reduce their complexity by hiding details not required for the analyses. For example,

control flow can be abstracted into functions. The power of abstraction is realized only

when sufficient information is exposed to facilitate reasoning while hiding unneeded

information. Obviously, this balance changes when the nature of the reasoning being

undertaken changes.

One classic example where exposing too much detail made understanding more difficult

is the use of GOTO statements in high-level programming languages. In his famous

letter to the CACM, Dijkstra argued that GOTO statements break the abstractions

used by programmers to reason about the state of a program during execution (the

programmer’s coordinate system as he called it) [40]. Over time, this view has been

largely accepted by designers of high-level programming languages and GOTO has been

replaced by other more structured control flow structures.

In contrast to the problem of the GOTO statement, where the statement causes ab-

stractions to be violated, determining side-effects in imperative programs is complicated

by the lack of detail provided by some control flow abstractions, especially functions.

Functions allow code blocks to be abstracted and reused. Unfortunately, in the family

of imperative programming languages I have chosen to focus on, there are no constraints

on the side-effects a function may have; they can access and update arbitrary shared

mutable state. Worse, their interface contract stipulates only the input and output

data types; the interface does not provide guarantees about the side-effects of invok-

ing a function. This lack of effect information makes reasoning about inter-procedural

dependencies difficult. I feel this is one of the key reasons why there has been only lim-

ited success in making parallelism more accessible to programmers using these popular

languages.

The obvious solution to this problem is to add side-effect declarations to function sig-

natures to capture the memory locations touched by the function. The actual memory

locations described by these declarations may not be known statically and could total

millions of individual memory locations. This means that the memory locations can-

not be enumerated individually on the function signature. It is, therefore, necessary to

abstract and summarize these effect sets in some way. One approach would be to list

logical regions or subsets of the entire memory space that include all of the memory


locations touched. These subsets must contain all of the memory locations touched,

but may also include memory locations not actually touched. Figure 1.2 shows an ab-

stract diagram of memory regions. Dependencies may exist when regions named in the

effect sets concerned overlap (these overlapping areas are visible in Figure 1.2). The

more precise the effect sets the smaller the number of false positives produced when

computing dependencies through set overlap. In cases where the effect sets, cannot be

computed, the effect can be conservatively stated as the entire memory space. Effect

precision can be further enhanced by separating the read and write effect sets since the

overlap of two read effects does not create a data dependency.

Figure 1.2: Abstract memory regions which can be named as effects. The areas ofoverlap between regions indicate possible data dependencies.

To be able to reason about parallelism, the goal of this thesis, I feel that both program-

mers and automated tools must use a unified system to reason about side-effects and

dependencies to make reasoning about parallelism practical. Programmers must be

able to describe and understand the side-effects in their programs and compilers must

be able to verify side-effects and communicate to the programmer which side-effects in

a program are hindering parallelism. If the parallelization process is shared between

the programmer and the automated tools and they are not using the same reasoning

system then communication is hindered and the effectiveness of the system is reduced.


1.4.1 Object-orientation

Given that I have chosen to focus on reasoning about parallelism in strong statically

typed, imperative languages, the next logical question is to consider precisely what

techniques could be used to build the effect system required. Capturing side-effects

in an abstract and composable manner requires programs to have structure to their

data and code. The family of imperative languages includes a large number of diverse

languages. Some are focused on the sequence of steps required to solve a problem,

but provide relatively few features for structuring the data being processed. Other

languages place more of an emphasis on the data being processed and so provide more

powerful features for structuring data and associating code with data.

Over the years, many different schemes for structuring an imperative program’s data

and code have been developed. Object-orientation is one such scheme which was de-

veloped in the 1960s and 1970s in languages like Simula [16, 92] and Smalltalk [47].

Object-orientation is centered around three fundamental concepts: encapsulation, in-

heritance, and dynamic method binding [107]. These principles encourage programmers

to structure their programs so that implementation details can be abstracted away and

different parts of the program are decoupled from one another.

The most interesting property of object-orientation which makes it attractive as a means

of abstracting and reasoning about side-effects is encapsulation [107]. Encapsulation

means restricting access to implementation details of an object [107]. This means

that objects which form part of an object’s representation (those objects which store

its internal state) should be protected from external access. Objects nest inside one

another to form a representation hierarchy which provides structure to the program.

Encapsulation, therefore, provides a means of abstracting parts of the program to hide

implementation details [57, 5].

In Figure 1.2, I showed how memory regions could be used to abstract program memory

operations, yet retain the ability to detect possible data dependencies between different

operations. Object-orientation allows this idea to be refined in terms of the hierarchy

of object representation formed by the principle of encapsulation. This structure can

be used as a basis for memory regions [28]. Naming an object as being read or written

can be taken to mean that the object itself or any object in its representation is read


or written. Because an object is normally only part of one object’s representation, the

memory regions become shaped like triangles and represent sub-trees of the program’s

representation structure. Two effects overlap if one is part of the other’s representation.

This structure is shown in Figure 1.3.

Figure 1.3: Memory regions based on an object-oriented program’s representation hi-erarchy. The white circles are objects and the triangles are the areas of memory whichform part of their representation. The inclusion of one triangle within another indi-cates that a data dependency could exist if they were named as effects on differentoperations.

I have chosen to focus this thesis on reasoning about parallelism in imperative object-

oriented languages because of the structure they provide to programs and their popu-

larity. There are a number of advantages and disadvantages to this approach compared

with traditional data dependency analysis techniques and this comparison will be dis-

cussed in detail in Chapter 8. The techniques discussed in this thesis may also apply

to languages with less structure or other types of structuring, but these languages are

considered to be outside the scope of this work.


1.4.2 Ownership Types

The structure of object-oriented programs, while useful to facilitate reasoning, is not

sufficient, on its own, to capture side-effects in an abstract and composable manner.

There has been a large amount of research in the program verification community on

using the structure of object-oriented programs to facilitate validation of invariants.

These systems can be adapted to facilitate inference of parallelism invariants; this

thesis adapts one such system called Ownership Types.

Ownership Types was originally constructed in response to validation experts noticing

that encapsulation enforcement was lacking in many popular object oriented languages.

Consider the code in Listing 1.3. Note that despite the private annotation on the

private Object[] signers;...public Object[] getSigners() {...return signers;}

Listing 1.3: A code snippet showing the Java 1.1.1 getSigners bug.

signers field, it is possible for the getSigners method to return the object referenced

by this field. The private annotation on the field protects only the name of the field

and not the data it contains. This code was the source of the infamous getSigners

bug in Java 1.1.1 for precisely this reason [113]. Ownership Types [28, 25, 27, 95, 73] is

one of the systems originally proposed to enforce this kind of protection in a rigorous

manner.

Ownership Types is one means of tracking this information because it makes the pro-

gram’s representation hierarchy explicit through the type system and also provides

facilities for abstracting these representation descriptions. It has been proposed in the

past that Ownership Types could be used to facilitate parallelization [20, 19], but their

use in discovering inherent parallelism has only just begun to be studied [18]. In this

thesis I use Ownership Types to provide a framework for abstractly expressing and

validating side-effects as well as detecting data dependencies which allows reasoning



1.5 Contributions

This thesis focuses on how to facilitate reasoning about the inherent parallelism in

strong, statically-typed, imperative, object-oriented languages. Reasoning about inter-

procedural dependencies in these imperative languages is complicated by the arbitrary

side-effects that can be caused by methods. The central idea is to build an abstract and

composable effect system that can be used to specify side-effects as part of a method’s

signature and to then use this effect information to detect data dependencies which

can, in turn, be used to find inherent parallelism.

The major contributions of this thesis are:

• Pulling together features from Ownership Types and adapting them to facili-

tate capturing, abstracting, and validating side-effects in a real language on real

programs to facilitate the detection of inherent parallelism.

• Developing and proving sufficient conditions for parallelism in terms of ownership

effects and relationships for a number of different parallelism patterns.

• The development of a runtime system to compliment static reasoning about par-

allelism to facilitate conditional parallelization of code blocks.

None of the existing works combine abstraction and composition with parallelization

and correctness checking to produce a framework which helps both programmers and

automated tools to reason about inherent parallelism. To demonstrate the practicality

of the ideas proposed, I have developed an extension of the C] version 3 language called

Zal (which means “dawn” in Sumerian, the earliest known written language). The

reasons for the choice of C] are discussed in Section 4.2. I have written a Zal compiler

which computes and validates effects in addition to using the effect information to safely

inject parallelism into an application. A complimentary runtime system, which is used

to facilitate conditional parallelism, has also been written in addition to libraries to

support the ownership and effects annotations added to the C] language. There are

a number of smaller, more technical contributions made through this implementation

process and these will be identified in the appropriate chapters.


The work contained in this thesis has produced publications [100, 31] including a paper

in a well regarded international conference in the area [31] as well as several technical

reports detailing different refinements of the proposed system. I extended the GPC]

compiler’s generic type infrastructure to support arbitrary type parameters. I then

used this infrastructure to help implement a complete compiler for the Zal program-

ming language; a compiler extension totalling over 10,000 lines of code. In addition

to this, I developed a number of runtime libraries to support tracking of ownership

and exploit conditional parallelism detected at compile-time. All of the publications,

the compiler, and the runtime libraries as well as the sample applications presented

in Chapter 7 developed as part of this research project are freely available from the

MQUTeR parallelism website [32].

1.6 Outline

Chapter 2 focuses on how to reason about data dependencies and inherent parallelism

in an abstract and composable manner. The chapter begins by explaining how the

representation hierarchy found in all object-oriented programs can be exploited to cap-

ture side-effects in an abstract and composable manner. The chapter then proceeds to

present sufficient conditions for the safe exploitation of task, loop, and pipeline par-

allelism. To help facilitate parallelization I formulate a new enhanced foreach loop

which can be used to rewrite for and while loops into a form suitable for analysis. The

discussion of the effect system and sufficient conditions is kept abstract and informal in

this chapter. The ideas are refined and formalized in subsequent chapters culminating

in the very formal definitions and proofs of correctness presented in Chapter 5.

Chapter 3 further refines the ideas presented in Chapter 2 and presents how these ideas

can be realized using concepts from Ownership Types. The discussion in Chapter 2

focuses on the high-level concepts of how objects and their relationships could be used

to express side-effects. The focus of Chapter 3 is on how object relationships can be

captured as part of a type system using Ownership Types. It also refines and begins to

formalize the sufficient conditions for parallelization presented in Chapter 2 in terms

of the Ownership Types system.


Chapter 4 continues the process of refining and realizing my ideas by presenting the

design of an extension of the C] language called Zal. Zal incorporates the ideas from

Chapters 2 and 3 into version 3.0 of the C] language. In doing so, it applies ideas

from Ownership Types to a number of program constructs which have not previously

been annotated with ownership information. It also examines some of the interesting

technical details discovered in undertaking this design and implementation exercise.

Having refined my ideas and incorporated them into a real-world programming lan-

guage, it remains to demonstrate the validity of my approach. Chapter 5 begins this

validation by sketching the formal proofs required for the proposals made in preced-

ing chapters. It also proves that the sufficient conditions for parallelism presented in

Chapter 3 are sound.

Chapter 6 discusses the design of a source-to-source compiler for Zal as well as a runtime

system for tracking ownership information and testing object relationships at runtime.

A number of interesting technical challenges and design issues were encountered during

the implementation process and these are discussed as the design is presented. The

compiler helps to demonstrate that the ideas proposed in this thesis can be realized

and serves as an enabler for the empirical validation presented in Chapter 7.

Having produced a compiler for Zal, it was used to compile a number of representative

sample programs written in Zal. Chapter 7 presents several detailed worked examples

of real programs originally written in C]. This chapter details the annotation of these

examples along with performance data comparing the automatically parallelized Zal

program to a hand parallelized implementation. The goal of this exercise is to validate

the proposals made in this thesis in terms of efficacy and effectiveness.

With the validation completed, Chapter 8 compares and contrasts the work in this

thesis with other related works. There are three broad areas of related work in the lit-

erature: traditional data dependency analyses, Ownership Types systems, and existing

parallel programming languages. Previous work in Ownership Types has focused on

program verification. The idea of using Ownership Types to reason about parallelism

has been proposed in the literature, but has not been tested until now. Existing work

on languages and type systems for parallel computing has tended to focus on providing


the programmer with tools to facilitate the manual parallelization of programs; pro-

grammers must decide when and where it is safe to employ parallelism without the

assistance of the compiler or other automated tools.

I conclude in Chapter 9 and outline possible areas for future work and further devel-

opment of the system based on my experiences and the results of applying the system

to real-world programs.

Chapter 2

Reasoning About Parallelism

This chapter presents my key ideas and insights about how side-effects can be statically

computed, summarized, and abstracted using memory regions. The intersection of the

effect sets, in terms of memory regions, of two arbitrary code fragments can be used

to detect possible dependencies between them. This static reasoning system forms the

core of this thesis and is used to determine when inherent parallelism can be safely

exploited.

The ideas presented in this chapter are in their most abstract and theoretical form.

Subsequent chapters will refine these ideas into a form where they can be applied to

a specific language and, in turn, that language will be used to parallelize some real

applications to validate my proposals.

This chapter is divided into two parts. The first part presents my ideas for an abstract

and composable effect system. This effect system describes effects for arbitrary code

blocks using memory regions. These effect descriptions contain all of the areas touched

by the code being analyzed, but may also include some additional areas. Following

the development of the effect system, I demonstrate how dependencies can be detected

by computing the intersection of the memory regions touched by the code blocks in

question.

The second half of this chapter develops conditions for the safe exploitation of a num-

ber of different parallelism patterns in terms of this effect system. Because the effects

produced by the effect system may include additional memory locations not actually

21

22 Chapter 2. Reasoning About Parallelism

touched by a given code fragment, the effects are not precise. This means that condi-

tions presented in terms of these effects are sufficient conditions. In describing these

conditions as sufficient, I am saying that there may be situations where the conditions

do not support parallelization of an application even though such parallelization would

not actually alter the behavior of the program. My system will never permit paralleliza-

tion when doing so could lead to behavior not consistent with the original sequential

version of the program.

The different parallelism patterns discussed in this chapter exploit different types of

inherent parallelism and so each has different sufficient conditions for safe use. This

chapter informally presents these sufficient conditions; they are refined in Chapter 3

formalized and proven in Chapter 5.

2.1 Side-effects

There are two types of dependencies that can exist in a program: control dependencies

and data dependencies. A control dependency occurs when a statement A determines

if a statement B executes while a data dependence occurs when a statement B accesses

the same memory as an earlier statement A.

Methods in imperative programming languages can be sources of side-effects which

can be involved in both kinds of dependency. Inter-procedural control dependencies

in these languages generally take the form of exceptions. It is possible to document

control-flow side-effects as part of the method signature, as was done with checked

exceptions in Java. Detecting, documenting, and enforcing restrictions on control flow

side-effects is a well studied problem both inter- and intra-procedurally. This thesis

does not concern itself with control flow dependencies of this type, but does acknowl-

edge that they exist in programs and may need to be accounted for when parallelizing

an application for correctness. The remainder of this thesis will focus on reasoning

about data dependencies, a problem that is not nearly as well studied, with a view to

facilitating the discovery and safe exploitation of inherent parallelism in mainstream

modern imperative programming languages.

Imperative programming languages expose the computer’s ability to store and retrieve

Chapter 2. Reasoning About Parallelism 23

intermediate results to programmers. This means that any given program fragment

written in an imperative programming language can update arbitrary parts of the pro-

gram’s shared mutable state. This does not cause a problem when only one instruction

in the program is executed at a time. However, when multiple instructions are executed

at the same time, instructions may attempt to read values before they are generated,

read values that have been updated by instructions that should be later in the pro-

gram, or race to update the same memory location. That is to say that executing

multiple instructions at the same time may cause dependencies between instructions to

be violated.

Traditional dependency analysis techniques operate on the level of individual variables

and memory locations. In order to determine where data dependencies may or may not

exist, it is necessary to undertake some kind of may-alias analysis to determine which

variables may refer to which locations in memory. Mainstream imperative programming

languages provide no basis for tracking or reasoning about aliasing and so these analyses

are difficult, especially when performed inter-procedurally. In the presence of complex

control flow, polymorphism, and dynamic method binding, the results of a may-alias

analysis tend to be that most variables of compatible types are possible aliases due to

the lack of language support. It is my opinion that as a result of this, it is necessary to

change the language to provide support to make reasoning about side-effects and data

dependencies tractable.

In addition to the language features above that can complicate dependency analysis,

there are two other facilities, generally provided by modern imperative languages, which

complicate reasoning about side-effects and dependencies. The first is separate compi-

lation. Separate compilation allows a program to be compiled in separate parts and the

result of those compilations can then be linked together at a later time. This means

that the source for a function may not be available for examination when undertaking

one or more of the compilations.

In this thesis, I advocate capturing the maximum side-effects of an executable definition,

expressed as sets of memory locations read and written, as part of the definition’s

declaration. The programmer may provide effect declarations to be validated against

the actual implementation at compilation time. If the effects of the implementation


exceed the declared effects then the program is considered to be invalid.

The goal of this work is to facilitate reasoning about the side-effects of arbitrary code

blocks for the purposes of computing dependencies and identifying opportunities to

exploit inherent parallelism. Adding effects to method signatures facilitates this because

it makes examining a method’s signature sufficient to determine the upper bounds

of its side-effects, thus eliminating the complications caused by separate compilation.

Others have proposed capturing effects on method signatures, but my formulation of

this idea, presented in this thesis, is geared towards reasoning about parallelism rather

than validating program properties; the traditional domain of such techniques (see

Chapter 8 for further discussion of related work).

Dynamic linking is also used frequently by modern programming languages and their

associated libraries to reduce executable file size and facilitate independent updates

of libraries. In dynamic linking, libraries and other third-party code are loaded into a

program only at runtime. This means that the effects of a method found in one of these

libraries at compile-time may not match the effects on the one loaded at runtime. This

problem is similar to other library versioning problems which have been well studied

and some of the same solutions can be applied. Simply checking that the version of the

library used by the compiler matches the version of the library linked in is sufficient

to avoid the problem of mismatches between effects at compile and execution time.

If greater flexibility is required with regard to versioning, then other more complex

solutions may be required.

2.1.1 Abstracting Effects

Having decided to add side-effects to method signatures, the question which arises is

how to describe method side-effects. Adding side-effects to method signatures could

expose implementation details that would otherwise be hidden from those calling the

method. Exposure of implementation details through effect declarations would break

composition; different implementations of the same interface could have different effect

declarations due to their differing implementation details only. Using an abstract effect

system could help to prevent this, if the abstraction is correctly chosen. Abstracting

effects has the added benefit of allowing precision to be traded for simplicity when the


amount of effect information becomes overwhelming. The key to obtaining all of these

benefits is choosing an appropriate effect abstraction system.

One way to try to obtain a suitable abstraction and composition mechanism for ef-

fects is to base the system on the program’s existing structure. Different imperative

programming languages provide different kinds of structure to programs. Some lan-

guages emphasize the sequence of steps to be executed and rely on functions as the

major unit of abstraction and organization; they provide relatively little structure to

a program’s data. Others place an emphasis on the structure of the program’s data

and provide means to associate code with data. One of the most popular program

structuring techniques in use today is object-orientation. In object-orientation, data is

grouped into objects which represent a concept or artifact in the system and consist of

data and associated functions called methods. Each object has its own internal state,

parts of which may be stored in other objects in the system; these referenced objects

holding this state are said to be part of the object’s representation. Viewing a pro-

gram’s heap memory in terms of objects and the representation relationships between

them provides a hierarchical view of the heap. This hierarchy can then be used to

describe and abstract effects without exposing implementation details. Most popular

modern imperative languages like Java, C], and Visual Basic employ object-orientation

to structure the program. As outlined in Chapter 1, this thesis will focus on object-

oriented imperative programming languages and will use their representation structure

to help describe side-effects in an abstract and composable manner.

2.1.2 Effect System Details

Having chosen to use the representation hierarchy as a basis of an effect system, the

next question is, how exactly are side-effects described in such a system? Side-effects

in such a system can be named using sets of objects which can be read or written.

An object named as an effect implicitly includes all the objects that form part of its

representation as part of the effect. Figure 2.1 shows a number of different objects,

illustrated as white circles, and the memory area they encompass when named, note

the implicit representation inclusion. A side-effect can, therefore, be abstracted by

naming one of the objects whose representation is accessed or modified instead of the


object itself. For example, in Figure 2.1 the effect represented by object 4 could be

abstracted as the effect represented by object 3 which includes it. This system facilitates

information hiding because even when an object’s representation is changed, the effect

of accessing or modifying that representation can still be summarized in the same way.

Precise details of how these effects can be expressed in a language is the subject of the

next chapter, Chapter 3.

Figure 2.1: Memory regions based on an object-oriented program’s representation hi-erarchy. The white circles are objects and the triangles are the areas of memory whichform part of their representations. The inclusion of one triangle within another is thebasis of effect abstraction and dependency detection.

Composition is the construction of a software assembly from smaller assemblies. An

effect system used in a language designed to support composition, as most modern

imperative programming languages do, needs to provide facilities for abstracting side-

effects so that implementation details are not inadvertently exposed through the effect

system. Composition, in the effect system I am proposing, is achieved through ab-

straction. Two different objects named as effects can be abstracted to a single object

which contains the two objects as part of its representation. This abstraction can be

used to hide implementation details while still providing correct effect information; the


abstraction only makes the effect less precise. This means that interference between

operations which employ the same component units can still be detected when desired.

How to do this is the focus of the next major section, Section 2.2.

It is important to note that other data side-effects, such as reading and writing physical

devices and other system resources can be captured using my proposed effect system.

These devices and resources can be thought of as other objects which exist in the

system. Provided they are given a name they can be modelled in the effect system like

any other object. This is similar to the idea of memory mapped I/O in hardware. In

the rest of this thesis, reference will be made to side-effects, often with reference to the

objects or memory being read and written, but the reader is asked to remember that

these effects can also include other resources.

2.1.3 Effect System Complications due to Object-Orientation

Having chosen to focus on object-oriented imperative programming languages, there is

one issue which arises which needs to be addressed in relation to method side-effects

— overriding. Method overriding can occur in two situations: replacing an existing

method implementation when subclassing, and implementing a method required by

an interface or abstract class. When overriding is coupled with dynamic binding the

result is that the most specialized implementation of a method is invoked by any given

call site. The downside of this is that it is not possible to determine which method

implementation will be invoked at any given call site at compile-time.

The inability to predict which method implementation will be invoked by a call site

complicates inter-procedural data dependency analysis since it becomes impossible to

determine which method’s signature needs to be examined to obtain the maximum

side-effects of invocation. One solution to this problem, which I have adopted in my

system, is to constrain the side-effects of overriding methods to include at most the

side-effects of the method being replaced. This means that the effects of the original

method declaration can be used in all effect computations and analyses without a need

to resolve exactly which implementation is being invoked. These effect constraints can

be verified at compile-time and methods which do not adhere to these constraints are

considered to be invalid by my system and will not be successfully compiled.


2.2 Detecting Data Dependencies

As discussed in Chapter 1, dependencies can take two forms: control dependencies and

data dependencies. Detecting control dependencies is relatively easy in modern im-

perative object-oriented languages due to the flow control abstractions they use. Data

dependencies are harder to detect because the major unit of control flow abstraction in

modern imperative object-oriented languages, the method, hides side-effect information

and so it is not possible to determine which pieces of data are read and written by a

method without examining its implementation.

The purpose of calculating side-effects was to facilitate the detection of data depen-

dencies. Having decided to use the representational hierarchy to express side-effects in

an abstract and composable manner, there needs to be some way to determine if two

different effects are disjoint or not.

When using the representation hierarchy as the basis of the effect system, side-effects

are expressed by naming objects whose representation is accessed, as discussed in Sec-

tion 2.1. Each object is directly part of at most one object’s representation. Naming

an object as being read or written implicitly names all of the named object’s repre-

sentation as being read or written. An object named as an effect, therefore, names an

entire subtree of objects in the representation hierarchy. Two objects, o1 and o2, when

named as effects are said to not overlap when o1 is not part of o2’s representation and

o2 is not part of o1’s representation (that is o1#o2 = ∅).

For two effect sets, E1 ={a1 . . . an

}and E2 =

{b1 . . . bm

}, to be non-overlapping, it is

necessary to determine if all of the objects in effect E1 are disjoint from the objects in

effect E2. If ∀i ∈ n, j ∈ m ai#bj = ∅ then the effect sets are disjoint.

Effects are specified as a tuple consisting of a set of objects read and a set of objects

written. As previously mentioned, there are three different types of data dependencies

which need to be preserved during parallelization:

• Flow Dependence — a memory location written by one operation is read by a

subsequent operation; a read-after-write (RAW) dependence.

• Output Dependence — a memory location written by one operation is also


written by a subsequent operation; a write-after-write (WAW) dependence.

• Anti-Dependence — a memory location read by one operation is updated by

a subsequent operation; a write-after-read (WAR) dependence.

Given two arbitrary effect tuples⟨r1, w1

⟩and

⟨r2, w2

⟩, it is sufficient to show that r1

does not overlap w2, r2 does not overlap w1, and w1 does not overlap w2 to show that

no data dependencies exist between the code blocks generating the tuples.

It is important to note that just because two object effects are not disjoint does not

mean a data dependence must exist, only that a data dependence may exist. If a data

dependence may exist, then to ensure the correctness of the program, it is necessary to

conservatively execute the original sequential version of the code fragment rather than

a parallel version.

2.3 Task Parallelism

Having completed the discussion of how to abstract side-effects and reason about de-

pendencies in the first half of this chapter, the following two sections examine different

types of parallelism and how they can be detected using this information.

If a programmer has a set of operations that need to be performed, most imperative

programming languages default to having the programmer list the computations in some

arbitrary sequence with each statement terminated by an end of statement operator (for

example, a semi-colon in many ALGOL descendents) thereby imposing a total order on

the computations. In actuality, the dependencies between operations may imply only a

partial order to the steps. The difference between this partial order and the total order

represents potential for parallelism. This type of parallelism is commonly referred to as

task parallelism [9] — the distribution of the execution of different, disjoint operations

across different threads of execution. The partial order can be constructed from the

total order by computing the dependencies between computations.


2.3.1 Sufficient Conditions

The effect system outlined in Section 2.1 discussed how side effects could be captured

using an abstract hierarchical model of a computer’s memory. Consider the paralleliza-

tion of two statements S1 followed by S2. The dependencies between statements can

be calculated based on their side-effects. For the two statements to execute safely in

parallel, provided that no control dependencies which would prohibit parallelization

exist, it is sufficient to show that the following dependencies do not exist as follows:

• flow dependency — the write effects of S1 do not overlap with the read effects

of S2;

• output dependency — the write effects of S1 do not overlap with the write

effects of S2; and

• anti-dependency — the read effects of S1 do not overlap with the write effects

of S2.

As was discussed in Section 2.2, the overlap of two effect sets can be determined by

considering effects pairwise from each set to determine if either one is included in the

other’s representation. If such an inclusion is found there may be a dependence between

the two statements. If no dependencies exist, the two statements may be safely executed

in parallel.

2.4 Loop Parallelism

Data parallelism is a form of parallelism which arises from the application of a stream

of operations to the elements of a data collection. Data parallelism is one of the ma-

jor sources of parallelism in programs and, more importantly, it is a source of scalable

parallelism. Scalable parallelism is parallelism that is not limited in the number of

threads of parallel execution by the structure of the program. The number of streams

of execution can be increased to improve performance or handle larger data sets as

more processing units become available. Exposing this kind of parallelism, allows the


program to exploit additional processing units when they become available, thus allow-

ing the program to benefit from future increases in the number of computational cores

in a computer.

In modern, imperative, object-oriented programs, one of the major sources of data

parallelism is the loop; specifically, loops which operate on data collections. Loops

which operate on data collections can take many different forms and perform a number

of different operations. Most of the operations performed by loops operating on data

collections can be broadly classified into one of three groups as listed below. Note that

the definitions of map, reduce and filter used here are slightly different from those used

in functional programming owning to the nature of imperative programming and its

extensive use of shared mutable state.

• map — A per element operation which modifies the value of the element it

is applied to (it may take additional parameters other than the element to be

updated).

• reduce — An operation which takes two or more elements of a collection and

produces a single result (it reduces a collection to a value).

• filter — An operation which removes elements from a collection based on a

predicate.

Of these three operations, loops applying a mapping operation to elements of a col-

lection are one of the main forms of data parallelism found in mainstream imperative

object-oriented languages today [123, 10]. Loops applying filter and reduce operations

may be parallelized using some kind of tree reduction technique, but doing so would

depend on the properties of the operation being performed rather than just the side-

effects of the operation being applied as is the case with the map operation. This thesis

focuses on the parallelization of mapping loops. Approaches for the parallelization of

reduction loops are discussed as part of the future work discussed in Section 9.3.3.2.

This section applies my proposed system to two loop parallelism patterns for loops

which apply a mapping operation to a collection: (1) data parallel loops where loop

iterations execute independently and are distributed across multiple processors and


(2) pipelining where the execution of a loop iteration is divided up into stages and

distributed across multiple processors. These patterns are based on each iteration of

the loop operating on a distinct data element. Loops which process the same data

element multiple times might still be parallelizable, but are not considered in this

section. Note also that some loops may employ a combination of map, reduce, and

filter operations and it may be possible to extract the map operation into a separate

loop to facilitate its parallelization. Automatically doing this extraction is beyond the

current state-of-the-art in the general case and is considered to be outside the scope of

this thesis.

2.4.1 Data Parallel Loops

The data parallelism pattern can be safely applied only if there are no inter-iteration

dependencies. This section begins by looking at the most declarative imperative looping

construct, the foreach loop, before moving on to more complex looping constructs.

2.4.1.1 Foreach Loops

Consider the simple data parallel loop shown in Listing 2.1:

foreach (T element in collection)element.operation();

Listing 2.1: A simple stereotypical data parallel foreach loop.

To execute the loop shown in Listing 2.1 in parallel, the elements of the collection

traversed by the loop must be distinct and there must be no dependencies between

the operations applied to the elements of the collection. If the operation applied to

each element did not write to any shared mutable state, the loop could run safely in

parallel, but it would not be able to perform any useful computational function since the

result could not be stored. Updates to arbitrary shared state could create dependencies

between the operations applied to each element. However, if the operation updates only

the element of the collection it is applied to then the following informal conditions are

sufficient to ensure that no inter-iteration dependencies exist in the loop:


• Loop Condition 1: there are no control dependencies which would prevent loop

parallelization,

• Loop Condition 2: the elements enumerated by the iterator supplying elements

to the loop body must be unique and have non-overlapping representations, and

• Loop Condition 3: the operation mutates only the representation of the ele-

ment on which it is invoked and it does not read the representation of any other

elements in the collection although it can read other, disjoint shared state.

These sufficient conditions will be formalized and proved in subsequent chapters.

Detecting the control dependencies which are the subject of Loop Condition 1 is a much

simpler problem than detecting data dependencies and is outside the scope of this work.

I do not claim any new contribution with respect to detecting control dependencies in

this thesis.

Loop Condition 2 can be satisfied in one of two ways. Either the uniqueness condition

can be dynamically tested just before loop execution or the programmer can assert

the uniqueness to hold. In the case of a programmer assertion, there is the option of

verifying the uniqueness invariant at runtime or turning off such assertion checking in

order to improve performance. If checked, such a uniqueness invariant could be verified

either during each insertion or just before when the invariant actually needs to hold.

The uniqueness assertion can be made by annotating either the collection itself or its

enumerator (a collection may contain duplicates, but if its enumerator returns only

unique elements then the condition is still effectively met). The uniqueness annotation

could be placed on the collection class, or just on specific instances of that collection

class. Which of the above possibilities is used to ensure Loop Condition 2 is met will

depend on programmer and performance considerations — this thesis does not stipulate

a single mechanism.

The final condition, Loop Condition 3, prevents two iterations from violating flow and

anti-dependencies. Loop Condition 2 provides that all of the elements traversed by

the loop are unique and so only one iteration can update each item. If updates were

allowed to any other shared state, different iterations could update the same memory

location which would create an output dependence between iterations.


If any of the conditions is known not to hold, then the original sequential loop should

be executed to preserve program correctness.

Loop Body Rewriting

Now that the sufficient conditions for the parallelization of a simple data parallel loop

have been explored, the question of how to generalize these conditions to handle arbi-

trary foreach loop bodies, like that shown in Listing 2.2, arises.

class Foo {...int a;foreach (T elem in collection) {

// sequence of statements possibly including local variable defs

// and use of the variable a

}...

}

Listing 2.2: A generic foreach loop.

Fortunately, generalization to arbitrary loop bodies is a natural extension of the tech-

niques for the parallelization of simple foreach loops developed in the previous section.

The loop can be conceptually rewritten as shown in Listing 2.3.

class Foo {...int a;foreach (T elem in collection) {

elem.loopBody(this,a);}...

}

Listing 2.3: A generic foreach loop with its body abstracted to a method on theelement.

The loop body conceptually becomes a method on the element type T as shown in

Listing 2.4. Note that if such a rewrite were to actually be undertaken there could be

problems with access to private fields and methods of the class originally containing

the loop body.

As previously noted, it is not necessary to actually perform this transformation; only to

compute effects and perform dependency analysis as if it had been performed (details

will be discussed when the application of these techniques to a real language and the

implementation of this extended language are discussed).


class T {public void loopBody(Foo me, int a) {

// same sequence of statements replacing

// all elem by this and all this by me

}}

Listing 2.4: The body of a generic foreach loop extracted as a method on the elementtype T.

2.4.1.2 Enhancing the Foreach Loop

The techniques developed so far work only on data parallel loops in the form of a

foreach loop. These techniques cannot handle arbitrary loops as they provide no

means of associating iterations with distinct data elements. There are, however, some

loops expressed using other looping constructs which are data parallel in nature and

could conceptually be converted to foreach loops. This is not done in many cases

due to the semantic restrictions of foreach loops. Specifically, foreach loops provide

access to each element in a collection, but do not allow for the elements of a collection

to be updated in place. Further, foreach loops do not provide access to the index of

an element within a collection. An example of a for loop which cannot be re-written

as a standard foreach loop for these reasons is shown in Listing 2.5.

for (int i = 0; i < list.Count; ++i) {list[i] = func(i, list[i], ...);

}

Listing 2.5: A for loop whose body updates elements of a collection in place and makesuse of the index of elements in the collection.

To support such cases, I have extended the syntax and semantics for foreach loops

over collections to support indexing and in place element updates to address these

problems. Listing 2.6 shows the syntax for expressing the loop shown in Listing 2.5 as

an enhanced foreach loop.

foreach (ref ElementType e at int i in list) {e = func(i, e, ...);

}

Listing 2.6: An example of the syntax for an enhanced foreach loop equivalent to thefor loop in Listing 2.5.

The two new syntactic features introduced to expose the enhanced functionality of the

foreach loop are the ref keyword and the at <type> <identifier> construct. It is


important to note that these two new pieces of syntax are optional; either or both may

be omitted. In the case where both are omitted, the enhanced foreach loop operates

like a standard foreach loop.

The ref keyword before the element type indicates that assignment to e should be

treated as an in place update of the element in the collection. If ref is omitted, e acts

like the iteration variable in a standard foreach loop — assignment to the variable

updates only the variable and not the collection itself.

The at <type> <identifier> provides a means of exposing an index of some kind to

be associated with the elements being retrieved from the collection. The details of how

to implement such a loop are deferred to Chapter 6.

2.4.1.3 The Enhanced Foreach Loop

The enhanced foreach loop retains the declarative style of the regular foreach loop.

There are two possible differences: the write-through update of the collection denoted

by the ref keyword, and the presence of an index of iteration variable. As was done

with standard foreach loops, begin by considering the trivial enhanced foreach loop

shown in Listing 2.7.

foreach (ref Elem e at Index i in collection)e.operation(i);

Listing 2.7: A simple enhanced foreach loop used to develop sufficient conditions forparallelism which can be generalized to all enhanced foreach loops.

It is interesting to note that the enhanced foreach loop can be parallelized using the

same sufficient conditions as the standard foreach loop. The index cannot be a source

of a loop carried dependence due to Loops Conditions 2 and 3. Also note that there

is no requirement that the index values be unique to the iteration, provided the index

values are not modified (this is due to Loop Condition 3). For example, the index could

be a key in a dictionary data structure; multiple values could have the same key.

As with the standard foreach loop, if any of the conditions is known not to hold, then

the original sequential loop must be executed to preserve program correctness.


2.4.1.4 Loop Rewriting

The foreach loop and the enhanced foreach loops considered in this section are declar-

ative in nature. The loop header explicitly identifies the collection the loop is operating

on. With this information, the side-effects of the loop body can be analyzed to try to

classify the operations the loop body is applying to the collection: map, reduce, and/or

filter. When a loop body is identified as a mapping operation, the sufficient conditions

just developed can be applied to determine if there is some exploitable data parallelism.

It may be determined, through analysis, that a loop body is actually a combination of

a mapping operation with some other operations such as reduction and filtering. Each

of these different operations could be a source of different kinds of parallelism and

would need to be parallelized differently, if at all. It may, therefore, be advantageous

to rewrite the foreach or enhanced foreach loop as a sequence of loops applying

individual operations, such as map, reduce, and filter, to the elements of the collection.

Doing so would make it easier to parallelize the loop. This rewriting process could, in

theory, be automated, but would require sophisticated semantic analysis and is beyond

the scope of the current work. A programmer could perform the rewriting and obtain

the benefits for important loops in the application.

Unfortunately, not all looping constructs in mainstream imperative programming lan-

guages are as declarative in nature as foreach loops. for and while loops, for example,

do not make explicit the collections the loop is processing. Identifying the collection

being processed allows the operations being performed in the loop body to be classified

and appropriately parallelized.

It may be possible to perform an analysis on an arbitrary loop body to identify the

collection being traversed, if any, and then rewrite the loop as a foreach or enhanced

foreach loop enumerating the elements of the collection. Automating this analysis

and loop rewriting in the general case would be difficult due to the expressiveness and

flexibility of looping constructs, such as the for and while loops found in modern

imperative languages. Such automation is beyond the scope of this work. However,

it is possible that a programmer could rewrite many common for and while loops as

foreach or enhanced foreach loops and these could then be subjected to the paral-

lelization analyses previously discussed.


2.4.2 Pipelining

The data parallelism pattern for loop parallelization can be applied only to loops with-

out inter-iteration dependencies. In this section, I look at another style of data par-

allelism in the form of loop pipelining. Loop pipelining is a technique for staging the

execution of loop iterations so that no one stream of execution needs to run an entire

iteration. This can allow iterations of some loops, which modify shared mutable state,

to be partially overlapped. In this section, I also explore the pipelining of foreach and

enhanced foreach loops because of their declarative nature. As was the case with data

parallelism, it may be possible to rewrite for and while loops as enhanced foreach

loops to facilitate their pipelining.

2.4.2.1 Foreach Loops

As before, I begin by considering the pipelining of a traditional foreach loop. Consider

the stylized loop shown in Listing 2.8 where S A through S D represent statements or

groups of statements in the loop body.

foreach (ElemType elem in collection) {S A;S B;S C;S D;

}

Listing 2.8: A generic foreach loop body consisting of four statements.

The goal of pipelining would be to overlap the execution of parts of the loop body

shown in Listing 2.8 as shown in Figure 2.2. A pipeline stage is a single step in this

overlapped loop body execution. In the case of the example shown in Figure 2.2, S A

through S D are each pipeline stages.

For the loop to continue to produce the same results as it would if it were executed

sequentially, certain constraints need to be applied to the side-effects of the pipeline

stages to ensure that no inter- or intra-loop iteration dependencies are violated. To

help visualize some of the dependencies which are allowed to exist between the stages

of a pipelined loop, consider the graph shown in Figure 2.3. In this graph each node

represents the execution of a pipeline stage for a specific iteration of a loop. For


Figure 2.2: The overlapping of the execution of loop iterations by using S A throughS D as pipeline stages. Notice how the iterations ripple through the different stages astime progresses.

example, the node in the top left corner of the diagram represents the execution of

the pipeline stage containing S A for iteration number 1. The arrows between the

boxes represent dependencies and the blue highlights show which stages of the different

iterations would execute together.

Figure 2.3: Graph to help visualize dependencies permitted between specific executionsof pipeline stages. S A through S D represent pipeline stages ordered from left to rightand loop iterations are arranged from top to bottom.

The question which naturally arises is how to construct a maximally pipelined im-

plementation of a loop from an arbitrary loop body without violating any inter- or


intra-iteration dependencies. There are a number of existing techniques aimed at doing

just this [4]. These algorithms, which schedule loop bodies for pipelined execution, gen-

erally operate on some form of Data Dependency Graph (DDG). The precise details

of how dependencies are represented in the graph vary slightly from one scheduling

algorithm to another [4], but all require inter- and intra-iteration dependencies to be

identified. All of these DDGs can be built by determining dependencies between all

pairs of statements in the loop body. This means that the effect system I have proposed

can be used as a basis for the construction of a DDG suitable for the application of

these existing techniques.

Rather than presenting a set of sufficient conditions and algorithm for pipelining, as

was done for data parallel loops, I instead describe how dependencies are computed

and classified as inter- or intra-iteration dependencies so that an appropriate DDG can

be constructed for an existing pipelining algorithm. These procedures are presented in

an informal manner here and will be formalized and proved in subsequent chapters.

Dependency calculation for pipelining purposes begins in the same way as for data

parallel loops. The side-effects of the statements in the loop body are calculated using

the loop rewriting technique so that the this context refers to the representation of

the data element being processed. The following algorithm can be used to construct

a generic DDG for the purposes of loop pipelining provided the data elements being

processed by the loop are unique:

1. Consider all statements in the loop body pairwise (including each statement with

itself):

(a) Add an inter-iteration dependency if the two statements share a flow, out-

put or anti-dependence on the representation of any object other than the

data element being processed or any stack variable declared outside the loop

body’s scope.

(b) Otherwise, if the two statements are different, add an intra-iteration de-

pendency if the two statements share a flow, output or anti-dependence on

the representation of the data element being processed or any stack variable

declared inside the loop body’s scope.


The DDG which results from this algorithm can then be specialized to suit a particular

pipelining algorithm.

2.4.2.2 Enhanced Foreach Loops

Having considered the pipelining of the traditional foreach loop, the question of

pipelining the enhanced foreach loop arises. The enhanced foreach loop retains

the declarative style of the standard foreach loop. There are two possible differences

between the standard and enhanced foreach loops: the write-through update of the

collection denoted by the ref keyword, and the presence of an index of iteration vari-

able.

Data dependency calculation for enhanced foreach loops proceeds in the same way as

for standard foreach loops. The index can be treated like any other piece of shared

mutable state and the in place update of an element of the collection is treated like any

other update of shared mutable state which creates an inter-iteration dependency.

2.5 Summary

At the beginning of this chapter, I discussed the complications which arise when trying

to apply traditional data dependency calculation techniques to large general purpose

programs written using modern imperative object-oriented languages. The most sig-

nificant complication is the difficulty of performing may-alias analyses due to a lack of

language support. Because of these complications, I advocate modifying the language

to supply the required support.

Adding side-effects to method signatures makes dependency analysis possible if the

effect system used to describe the effects is correctly chosen. I have advocated an

approach of using the object representation hierarchy in the program to describe effects

in an abstract and composable manner. Under such a system, naming an object as being

read or written implicitly includes all the objects in its representation as well. I then

demonstrated that it is possible to compute the possible existence of data dependencies

by calculating the intersections of the effect sets of code fragments; this allows data

dependencies between arbitrary code blocks to be computed.


Using the effect system and dependency detection techniques, I developed sufficient con-

ditions for the application of a number of different parallelizing code transformations.

The patterns discussed included task parallelism, data parallelism, and loop pipelining.

The loop centric patterns were formulated for the declarative foreach loop found in

most modern imperative object-oriented languages. I also proposed an enhanced ver-

sion of the loop which would allow more for and while loops to be re-written using

a more declarative enhanced foreach loop construct. Doing so, allows my techniques

to be applied to these loops to help find inherent parallelism which can be exploited

safely.

The ideas presented in this chapter were presented in an abstract form and together

form the core of this thesis. Subsequent chapters will refine how the effect system can

be realized as a language extension, the sufficient conditions to operate in terms of

an actual effect system, and the application of the effect system to a real language.

Finally, a number of examples to validate these ideas will be presented.

Chapter 3

Realization of the Abstract

System

In the previous chapter, Chapter 2, I presented my basic ideas on how to reason about

side-effects and parallelism in an abstract and composable manner using the represen-

tation hierarchy present in object-oriented programs. The discussion in Chapter 2 was

high-level and abstract. Having proposed these ideas, the goal now is to refine these

ideas into a form where they can be applied to a real programming language in Chap-

ter 4. The goal of this exercise being to validate my ideas by applying them to a set of

representative sample applications.

The goal of this chapter is to explain how the abstract ideas in Chapter 2 can be realized

using ideas from Ownership Types. The realization in this chapter is specific to modern

imperative object-oriented languages, but is not tailored to a specific language.

This chapter is divided into several parts. I begin with a brief overview of Ownership

Types for readers not familiar with the work. This is followed by a demonstration of

how the ideas presented in Chapter 2 can be realized using Ownership Types. With

the representation hierarchy realized using Ownership Types, I then show how the

effect disjointness operations discussed in Chapter 2 can be expressed using Ownership

Types and effect declarations. The final part of the chapter discusses how the sufficient

conditions for task, data, and pipeline parallelism can be expressed in terms of the

realization proposed at the start of this chapter.

43

44 Chapter 3. Realization of the Abstract System

3.1 Encapsulation Enforcement

To be able to realize my proposal from Chapter 2, I need a means of tracking the

representation relationships between objects in a program. Much of the previous work

on tracking representation relationships originates from efforts in the software verifi-

cation community to track and enforce object encapsulation [57, 5]. As was shown in

Listing 1.3 in Chapter 1, consider the code snippet shown in Listing 3.1:

private Object[] signers;...public Object[] getSigners() { ...return signers; }

Listing 3.1: A code snippet showing the field and method signature implicated in theJava 1.1.1 getSigners bug.

As was discussed in Chapter 1, the private annotation on the signers field, shown

in Listing 3.1, it is possible for the getSigners method to return the object referenced

by this field. The private annotation on the field protects only the name of the field

and not the data it contains. This code was the source of the infamous getSigners

bug in Java 1.1.1 for precisely this reason [113].

The signers field and the object to which that field refers, in the above example,

is logically part of the object’s representation. Protecting an object’s representation

through access restriction and prohibition on external modification is called encapsu-

lation. Enforcing encapsulation can make it easier to write and debug object-oriented

programs [5, 57, 28]. Further, detecting encapsulation violations can help in identifying

potential bugs [113]. Because of these properties, encapsulation has been well studied

by the software validation and correctness communities. Some of this work has involved

the creation of type systems to validate or enforce encapsulation in modern imperative

object-oriented languages.

Validating or enforcing encapsulation generally requires each object’s position in the

representation hierarchy to be known. This can be achieved by having each object

track some combination of (1) which object’s representation it is part of and/or (2)

which objects are part of its representation. In Chapter 2, I proposed an effect system

based on the representation hierarchy present in object-oriented programs. A system

designed to check or enforce encapsulation could, therefore, also be used as a basis for

Chapter 3. Realization of the Abstract System 45

my proposed effect system which uses the representational hierarchy to facilitate effect

abstraction.

There are a number of different encapsulation tracking and enforcement systems doc-

umented in the literature (see Chapter 8 for more details), one popular branch of this

research is Ownership Types [28, 27, 25, 24, 49, 73, 74]. Ownership Types is a system

of type annotations used to track representation relationships [91, 28]. The effect sys-

tem I proposed in Chapter 2 could be realized using a number of these representation

tracking and enforcement systems, but I have chosen to use Ownership Types in this

thesis.

3.2 Ownership Types

There are a number of different kinds of ownership type systems. These different

systems have been designed for a number of different purposes, all related to program

validation. These type systems can be roughly divided into two groups: those which

derive from the original ownership formulation, using explicit ownership parameters, as

proposed by Clarke, Potter and Noble [28], and those which derive from the Universe

Types system proposed by Muller and Poetzsch-Heffter [88] which employ only relative

type annotations rather than explicit ownership parameters. I chose to base my effect

system on the systems employing explicit ownership parameters in the spirit of those

originally proposed by Clarke, Potter and Noble. This style of system captures more

detailed information than the relative ownership annotations found in Universe Types.

For a more detailed discussion of these different systems see Chapter 8.

In this section I present the ownership type and effect system I have developed to fa-

cilitate reasoning about parallelism. In this thesis, I am not claiming a contribution

of any new individual language features. Rather, my contribution is the unique com-

bination of these language features to facilitate reasoning about parallelism. There

are many different ownership systems documented in the literature which have been

proposed for many different purposes. The use of these systems for reasoning about

parallelism has been proposed, but not tested in the literature before this work. The

most closely related systems in the literature have focused on verifying the use of locks


in explicitly parallel programs. This is a very different problem from that of finding

inherent parallelism in a sequential program due to the simpler reasoning and infor-

mation tracking required. My contribution is using ownerships to detect and exploit

inherent parallelism.

In Ownership Types the tracking of representation is achieved through the use of ob-

ject contexts, hereafter referred to simply as contexts. As Clarke, Potter, and Noble

eloquently describe it, “Each object owns a context, and is owned by a context that it

resides within” [28]. An object stores the objects that are part of its representation in

its own context (referred to as the object’s this context). The objects in an object’s

this context are said to be owned by that object. Top-level objects not part of any

object’s representation are said to be owned by the special top context world.

Ownership and contexts are tracked by parameterizing the type declarations in the

system with context parameters. The first context parameter, by convention, is the

owning context. The other context parameters are used to construct types of other

objects used within the class which are not owned by the current object.

The ownership syntax I have used in my system draws upon that of several previous

ownership type systems proposed in the literature [27, 73, 49]. My system varies slightly

from these to suit my aims and to make it easier to parse the syntax when added to

languages which already make extensive use of most of the brackets available on the

standard keyboard. As an example of the ownership syntax I have chosen to employ,

consider the example of the simple stack implementation shown in Listing 3.2.

In Listing 3.2, three classes are declared: Object, Stack, and Element (which holds

one stack entry). The first context parameter on all of these types is the owner and

is the object whose representation an instance of the class belongs to. The second

parameter, present on the Stack and Element types, is the owner of the data stored in

the Stack. Without the second context parameter it would not be possible to construct

the type of the data field in the Element unless the data owner was assumed to be the

same as the owner of the stack. Notice that formal context parameters are specified in

[]s while actual context parameters are listed between ||s. This was done for clarity,

to simplify parsing, and to avoid ambiguity over the meanings of different kinds of

brackets. Notice, on the first line of the Stack class, how the elements of the stack are


public class Object[owner] {public bool equals[paramOwner](Object|paramOwner| other) { ... }...

}public class Stack[owner, dataOwner] {

private Element|this, dataOwner| top;public void push(Object|dataOwner| data) {

top = new Element|this, dataOwner|(top, data);}public Object|dataOwner| pop() {

Object|dataOwner| toReturn = top.getData();top = top.getNext();return toReturn;

}public bool contains[otherOwner](Object|otherOwner| other) {

for (Element|this, dataOwner| i = top; i != null; i = i.getData())if (i.getData().equals|otherOwner|(other))

return true;return false;

}}public class Element[owner, dataOwner] {

private Element|owner, dataOwner| next;private Object|dataOwner| data;public Element(Element|owner, dataOwner| next, Object|dataOwner| data){

this.next = next;this.data = data;

}public Object|dataOwner| getData() {

return data;}public Element|owner, dataOwner| getNext() {

return next;}

}

Listing 3.2: A simple stack implementation showing how I annotate classes with contextparameters.

owned by the Stack’s this context as they are part of its internal representation. The

owner of the element is recursively passed through the linked-list of elements.

I also allow methods to be parameterized with context parameters. This allows methods

to access data from contexts not directly referenced by the class. For example, in

Listing 3.2, the equals method on Object and the contains method on Stack both

take a type parameter that is used to allow the owner of the method’s parameter to vary.

This provides greater flexibility and reduces the number of context parameters required

on the average type declaration. It is possible that the actual context parameters for

a method invocation could be inferred from the type parameters as is done in C] with

generic type parameters [78]. This feature has not been implemented in my system to


date, but it could easily be added.

3.2.1 Generics

Most modern imperative object-oriented languages employ some kind of generics to

make it easier to write correctly typed programs. The interaction between generic type

parameters and context parameters has been heavily studied by the Ownership Types

community and a number of techniques for intermixing them have been proposed, each

with its own advantages and disadvantages [95]. In my ownership implementation, I

treat generic type parameters as orthogonal to the ownership context parameters. This

means that I do not mix the two annotations; when you construct a type it may have

both type parameters and context parameters as shown in Listing 3.3.

public class Example<T, S>[owner, otherContext] { ... }

Listing 3.3: An example of a class with both generic type parameters and contextparameters.

Generic Ownership [95] advocates a different approach in which ownership and type

parameters are intermixed to simplify program syntax. I chose an orthogonal approach

to simplify program parsing and facilitate retrofitting existing programs and program-

ming tools with ownership information. Further discussion of Generic Ownership and

other related work is undertaken in Chapter 8.

C], like Java, allows constraints to be placed on generic type parameters to limit the

actual types which may be supplied for a given formal type parameter [78]. I rely on

these type constraints to supply additional ownership information about generic type

parameters, including their associated context parameters.

Listing 3.4 shows a revised version of the stack implementation shown in Listing 3.2.

This implementation uses a generic type parameter to store the type of the data ele-

ments stored in the stack. It is interesting to note that with the introduction of the

generic type parameter T, it is no longer necessary to explicitly include the dataOwner

context parameter on the Element implementation since it does not access or manipu-

late the state of the data referenced by the element.

Listing 3.4 also serves as an example of how additonal ownership information can be


public class Object[owner] {public bool equals[paramOwner](Object|paramOwner| other) { ... }...

}public class Stack<T>[owner, dataOwner] where T : Object|dataOwner| {

private Element<T>|this| top;public void push(T data) {

top = new Element<T>|this|(top, data);}public T pop() {

T toReturn = top.getData();top = top.getNext();return toReturn;

}public bool contains[otherOwner](Object|otherOwner| other) {

for (Element<T>|this| i = top; i != null; i = i.getData())if (i.getData().equals|otherOwner|(other))

return true;return false;

}}public class Element<T>[owner] {

private Element<T>|owner| next;private T data;public Element(Element<T>|owner| next, T data) {


}public T getData() {

return data;}public Element<T>|owner| getNext() {

return next;}

}

Listing 3.4: A simple generic stack implementation showing how I annotate classes withcontext parameters and how they interact with generic type parameters.

supplied for a generic type parameter using type constraints. The Stack class has a

constraint on its generic type parameter which allows the contents of the data elements

stored in the class to be accessed. This allows the contains method to be implemented

as shown.

3.2.2 Subtyping

Having added context parameters to the language, it is necessary to consider how

to handle these parameters when extending a class. I am not the first to consider

this problem; it has previously been explored in a number of other Ownership Types

systems [28, 27, 73]. It turns out that they can be handled in the same way as generic


type parameters. Formal context parameters in the subclass are mapped to formal

context parameters in the parent as part of the extension declaration. A child class may

have as many or as few context parameters as desired, just as is the case with generic

type parameters. There is, however, a requirement that the first context parameter,

the owner, remain the same between parent and child. This constraint is necessary to

guarantee consistency in ownership and facilitate enforcement of the invariants required

to reason about side-effects and parallelism. Listing 3.5 shows an example of how a

subclass’s context parameters are mapped to its parent’s context parameters.

public class Person[owner] {String|this| firstName;String|this| lastName;...

}

public class Employee[owner, company] : Person|owner| {Company|company| employer;...

}

public class Manager[owner, employeeOwner, company]: Employee|owner, company| {List<Employee|owner,company|>|this| employees;...

}

public class SelfEmployed[owner] : Manager|owner,this,this| {...

}

Listing 3.5: An example showing how a child class maps its formal context parametersto those of its parent.

In the code shown in Listing 3.5, note how a class’s formal context parameters are

mapped onto the formal context parameters of the class being extended. Child classes

can add additional context parameters, as demonstrated by the Employee and Manager

classes. A child class can also map parent context parameters to this or to the same

context parameter as demonstrated by the SelfEmployed class. Also of note is the

Employee list declaration, note that actual context parameters are supplied for both

the Employee type parameter and the List itself.


3.2.3 Type Compatibility

Whether a value of a given type can be used in a given position or not is dictated by a

language’s type compatibility rules. For example, the type of an assignment’s r-value

must be compatible with the type of its l-value. In an object-oriented language, as long

as the type of the r-value is equivalent to or a subtype of the l-value’s type then the

assignment is valid.

The addition of generic type parameters further complicate the rules for type com-

patibility. As an example of type compatibility consider the Node data type shown in

Listing 3.6. Logically, many programmers would expect given a type Parent and a

subtype of it called Child that a node holding a instance of Child could be used where

a node holding an instance of Parent is expected. Unfortunately, allowing this would

open a hole in the type system. Consider the code at the bottom of Listing 3.6 where

this is done. The assignment occurs, but the next line where a new node is created and

stored in the child’s next field is no longer correct. The parentEx variable allows the

assignment due to its type, but this would put a node holding a Parent into the next

field of a node only allowed to hold a reference to nodes holding a Child object. This

is inconsistent and causes the type system to break.

public class Node<T> {public Node<T> next;

}public class Parent { ... }public class Child : Parent { ... }

...Node<Parent> parentEx;Node<Child> childEx;...parentEx = childEx;parentEx.next = new List<Parent>(); // where variance causes a problem

Listing 3.6: An example showing how allowing variance in type parameters can createholes in the type system.

To solve this problem, most languages employing generic type parameters require strict

equality of type parameters when checking type consistency. The same problem that

occurs with generics can occur with context parameters and can be solved in the same

way. In my system, when checking types parameterized with context parameters, it is


first necessary to ensure the parameterized types are compatible. If generic type param-

eters are present they also need to be compared for compatibility once compatibility

of the parameterized types has been established. Assuming these checks succeed, the

context parameters then need to be checked for compatibility. This is done by checking

that context parameters in equivalent positions (taking in to account the parameter

mappings between the superclass and the subclass) have the same name. This is the

same as generic type parameter compatibility and prevents the hole in the type system

demonstrated above from occurring.

3.3 Side-effects Using Contexts

Having defined the operation of Ownership Types in my proposed system, it is now

necessary to consider how effects can be computed in terms of ownership contexts. This

is the key to realizing my ideas on how to reason about side-effects in an abstract and

composable manner.

In mainstream object-oriented programming languages, where objects are allocated on

the heap by default, modelling the operation of the heap is often enough to either en-

force or validate encapsulation. However, reasoning about dependencies and inherent

parallelism in a program requires both stack and heap effects to be considered as op-

erations on both can be sources of dependencies. Different imperative object-oriented

languages generally employ the same basic heap model; the entire heap is globally ac-

cessible throughout the program and allocation of heap memory is always permitted

as long as sufficient free memory exists. The consistent heap model means that the

operation of Ownership Types does not have to be significantly modified to operate on

different imperative object-oriented languages when being used to enforce or validate

encapsulation. Different language syntax and semantics do need to be accommodated,

but the basic operation of the ownership system does not need to change. Unlike the

heap model, the stack models employed by these languages vary greatly in the referenc-

ing and access models they support. Some languages, like Java and C], heavily restrict

access to the stack while others, like C++, do not.

In this thesis, I have chosen to focus on languages which employ a highly restricted


stack model. For the moment I assume that there are no references to objects residing

on the stack allowed. This will be loosened in Chapter 4, when I apply these techniques

to a specific language. Heap effects can, therefore, be reasoned about independently

from stack effects. This means that all operations which read or write fields cause heap

effects and all operations which read or write stack variables cause stack effects. The

statements that make up a method body may have both stack and heap side-effects.

Each method has its own activation frame on the stack when it is executing and it

uses this frame to store its stack variables. As a result, the lifetime of stack variables

is limited to, at most, the lifetime of the method call in which they occur. Because

there are no references into the stack, all of the stack variables read and written in

a method body will cease to exist when the method returns. As a result, methods

need only declare heap effects as part of their signature since their stack effects cannot

be observed once the method returns. The techniques for computing stack and heap

effects are discussed in the following subsections.

3.3.1 Heap Effects

I have decided to use ownership contexts to express heap effects in my effect system.

Because data dependencies do not exist simply because of memory reads (at least one

write to a location involved in a dependency is necessary) I decided to express heap

effects as a set of contexts read and a set of contexts written. This design decision helps

to reduce the number of false-positive dependencies detected by my system compared

to one where the read and write effect sets are not distinguished. In a system with

a unified effect set, having two effect sets containing the same context means that a

possible dependency between the code blocks in question would have to be assumed

even if they both just happened to read the same memory location.

All heap read and write effects in imperative object-oriented languages originate as

reads or writes of the fields of heap-allocated objects. The rules for computing the

effects of expressions and statements which may contain reads or writes of fields ensure

that these effects are preserved and incorporated into the computed effects. An example

of an effect computation rules for reading a field is shown in the rule below. As shown,

reading a field of the current object causes a read of the current object’s this context.


Notice that when a field belonging to another object is read, the owner of the object

whose field is read is added to the effect set. Because the this context always refers

to the current object’s context, the this context of other objects cannot be directly

named. The read of other object’s state is, therefore, abstracted as a read of the other

object’s owner. I call this process of abstracting effects raising.

` f :⟨〈this|∅〉〈∅|∅〉

⟩e |= T ` e : ϕ

` e.f : ϕ ∪⟨〈owner

(T

)|∅〉〈∅|∅〉

⟩The general form of an effect rule ` e :

⟨〈hr|hw〉〈sr|sw〉

⟩can be read as an expression

e produces heap read and write effects hr and hw as well as stack read and write

effects of sr and sw (discussed in the next section) when evaluated. The e |= T

form means that the expression e has a type T , where T is a valid type. Finally, the

ϕ ∪⟨〈owner

(T

)|∅〉〈∅|∅〉

⟩in the above rule is the union of two effect sets. When

two effect sets are unioned, the result is the smallest set of effects created from the two

original effect sets which includes all of the effects named in the two original sets.

When considering two arbitrary contexts, the relationship between two contexts can

be said to be one of

• equal (=) — they are one and the same

• dominating (<) — one context is directly or indirectly owned by the other

• disjoint (#) — they appear on different branches of the ownership tree

When unioning effect sets, it is desirable to eliminate contexts which are dominated by

other contexts in the same effect set to reduce the size of the set. It is also necessary

to ensure that no effect information is lost when such a union operation is performed.

To achieve these goals, the read and write sets of the two effects are unioned separately

using the algorithm shown in Listing 3.7.

Compound expressions which contain other expressions, such as the binary addition

expression shown in the rule below, ensure that the effects of evaluating the nested


union(set1, set2) :result := set2outer: for item1 in set1

for item2 in set2if item2 6 item1

result := result − item2if item1 < item2

next outerresult := result + item1

return result

Listing 3.7: Algorithm for unioning two sets of effects, set1 and set2. Note that the +and − operations are just set addition and subtraction.

expressions are included in the overall effects of the compound expression. This is done

by computing the effects of evaluating each of the component sub-expressions and then

unioning these effects using the algorithm just described. Note that if user-defined

operator overloading is permitted in the language to which this rule is applied, the

possible side-effects of the user-defined operation would also need to be included in the

effect set. The rule below shows an example of how the effects of a binary addition

operation are computed.

` e1 : ϕ ` e2 : ϕ′

` e1 + e2 : ϕ ∪ ϕ′

In the same way, a statement and a block of statements recursively ensure their com-

puted effect sets contain all of the effects of their constituent parts. The rule below

shows how the effects for a block of statements are calculated.

` s : ϕ `{s}

: ϕ′

`{s; s

}: ϕ ∪ ϕ′

`{}

:⟨

∅,∅⟩

As was discussed in Section 2.1, there are a number of complications which arise in

modern imperative object-oriented languages which can significantly complicate rea-

soning inter-procedurally about side-effects. My solution to this problem is to include

method read and write effects as part of the methods’ signatures. If the programmer

supplies an effect declaration then it is necessary to ensure that the effect of the method

body is either equal to or a subset of the declared effects so that the declaration can

be relied upon during dependency analysis.


Constructor effects are handled in a similar way to method effects, but with one major

difference. When a constructor is invoked to initialize a newly allocated object, only

the object’s constructor has a reference to the new object. The constructor may pass

this reference to other constructors and methods it invokes, but the reference can only

escape once the constructor has returned. As a result of this, reads and writes of the

this context can be omitted from constructor read and write effects.

Listing 3.8 shows a stack implementation similar to the one used previously to illustrate

ownership syntax in Listing 3.4. This example adds the effect annotation syntax for

methods and constructors.

public class Object[owner] {public bool equals[paramOwner](Object|paramOwner| other)

reads <this,paramOwner> writes <> { ... } ...}public class Stack<T>[owner, dataOwner] where T : Object|dataOwner| {

private Element<T>|this| top;public void push(T data) reads <this> writes <this> {

top = new Element<T>|this|(top, data);}public T pop() reads <this> writes <this> {

T toReturn = top.getData();top = top.getNext();return toReturn;

}public bool contains[otherOwner](Object|otherOwner| other)

reads <this, dataOwner, otherOwner> writes <> {for (Element<T>|this| i = top; i != null; i = i.getData())

if (i.getData().equals|otherOwner|(other))return true;

return false;}

}public class Element<T>[owner] {

private Element<T>|owner| next;private T data;public Element(Element<T>|owner| next, T data) reads <> writes <> {


}public T getData() reads <this> writes <> {

return data;}public Element<T>|owner| getNext() reads <this> writes <> {

return next;}

}

Listing 3.8: A simple stack implementation illustrating method and constructor effectdeclarations. Method-level context parameters are specified using a notation inspiredby C]’s method-level generic type parameters.


There are a number of things to note from the example shown in Listing 3.8. The

Element constructor has empty read and write effects because it assigns values to only

the fields of the object being created. Normally, an assignment to an object’s fields

would generate a write of the this context, but because the only references to the

new object exist within the constructor, the effects are omitted to simplify dependence

analysis. The Element’s accessor methods read the this context as expected. The more

interesting example is the Stack’s contains method. The method calls the equals

method on the top Object, passing otherOwner as a context parameter. The overall

effect of the method is that it reads the stack (this), the representation of the data

items stored in the stack (dataOwner), and the representation of the method parameter

(otherOwner).

Note that effects can be described at different levels of abstraction due to the hierar-

chical nature of contexts, as was shown when discussing the computation of the effect

of reading a field. Side-effects in terms of contexts can be thought of as similar to

physical street addresses. An effect could be described as being limited to a very pre-

cise location, for example 5th Avenue, Manhattan. It would also be correct, but less

accurate, to say that the effect is located in New York City or indeed in the United

States. Effect descriptions can be abstracted to make them more inclusive. Such an

abstraction might be to make it easier to name an effect or to avoid exposing implemen-

tation details through the effect description. This abstraction and information hiding

comes at the cost of reduced effect accuracy. Extending this analogy to the problem of

reasoning about the overlap of effect sets, if we were then to observe an effect occurring

in Boston we would know that the effect described as being in New York and the effect

in Boston must be disjoint because they are in different cities. If, however, the effect

occurring in New York City were abstracted to be an effect occurring in the United

States it would no longer be possible to determine that it was disjoint from the effect

in Boston.

3.3.2 Stack Effects

Stack effects in my system are specified as a set of local variables and method parameters

that are read, and a set of locals that are written. In languages where local variables


can be shadowed, it is necessary to rename these variables so that they all have unique

names. With unique variable names, the unioning of two sets of stack effects is achieved

by a standard set union of the named elements.

A stack read effect is caused by the read of a local (a local variable or method parame-

ter). The type rule below shows the rule for reading a local named x. The type rules of

the language recursively ensure that all compound expressions and statements in the

same lexical scope as the local’s declaration preserve the local whenever it occurs in

the stack effects of any of their component expressions.

` x :⟨〈∅|∅〉〈x|∅〉

⟩In most imperative object-oriented languages, like Java and C], local variables are

lexically scoped; they have a fixed life-time that is restricted to a written block of

program code. When the stack effects of a code block are calculated outer variables

are only included in the computed effect sets; inner variables are excluded since they

are not externally visible.

Finally, if a stack local contains a reference to an object on the heap, then reading a

field of the referenced object via the local causes both a stack and heap effect. A read

of a field via a local, causes a stack read effect of the variable and a heap read effect

of the owner of the object accessed. If a field is written to via a local, then the total

effect of the expression includes a read of the local and a write of the owning context

of the object modified.

3.4 Effect Disjointness

To determine if data dependencies are possible between arbitrary code blocks, it is

necessary to determine if their effects are disjoint or not. This section looks at how

disjointness is computed for heap and stack effects.

3.4.1 Heap Effect Disjointness

Consider how to determine the disjointness of two sets of heap effects; to do this, there

must be some basis to reason about the relationship between two contexts. Ideally,


this reasoning would be undertaken statically, that is before program execution. When

this is not possible, runtime testing of contexts to determine their relationships may be

desired to supplement the static reasoning. The construction of my ownership system

means that there are some statically known context relationships which apply to all

classes: (1) the world dominates all other contexts in the system and (2) the class’s

owner context dominates the class’s this context.

3.4.2 Facilitating Upwards Data Access

The ownership type system features discussed so far have focused on how to distinguish

between the representations of different objects. One important scenario not addressed

by the features discussed is how to allow more precise effects when reading shared

mutable state of an ancestor context (a context that directly or indirectly owns the

current context).

Figure 3.1 illustrates the ownership tree we would like to be able to support. We have

an object owned by a context c. From context c we wish to read data from context r.

If context r is not in scope (i.e. we cannot name it) then we must access r through

context b, an upward access.

Figure 3.1: ownership relationships between contexts at runtime used as an example ofcapturing context disjointness using sub-contexts.

A read of r processed by parent object b is likely to be summarized as a read of b if r is

not nameable. This abstraction would make a safe read of context r relative to context

c into a read of context b relative to context c which would be unsafe. To avoid this

problem, I introduced the notion of sub-contexts into my system to allow it to name

subparts of an object. In the above example, the context b could be partitioned into

sub-contexts; it would “own” a finite number of named sub-contexts b1 and b2. Using


these sub-contexts reading r could be summarized as a read of b1 rather than b itself.

If the context c is located in sub-context b2, then we could safely allow the read of b1

as it is disjoint from c.

Having decided to allow contexts to be partitioned into sub-contexts, the next questions

are, what are the scope of these sub-contexts and how can they be declared or created.

It is my belief that these sub-contexts represent groupings of logically related objects

and that the creation of these sub-contexts is a design choice based on the design of

the object in which they are declared. I have, therefore, decided to only allow the this

context in an object to be subdivided into sub-contexts. Further, because the design

of sub-contexts is related to the implementation details of the object they are part of,

allowing other classes to name them directly would cause implementation exposure. The

ability to directly name an object’s sub-contexts is, therefore, restricted to the object

which declared them. This means that the sub-contexts of b cannot be named directly

from context c. The sub-contexts of b can be passed to c via context parameters and

so the read of r can still be summarized as a read of b1 despite the naming restrictions.

This would allow the read of context r to be safe provided that c is not in the same

sub-context as r.

All sub-contexts are dominated by their containing object’s this context. In addition

to their use with types, I also allow sub-contexts to appear in method read and write

effect sets. Sub-contexts named as effects are abstracted in the same way as the this

context of the object they are part of. This allows more precise effect information to

be captured when desired.

Within each class, the programmer can decide if they wish to declare sub-contexts and

if they do, they can declare as few or as many as they desire. In the extreme case,

each field might be given its own sub-context, but programmers would more commonly

create a sub-context to encapsulate a group of related fields. The more sub-contexts, the

more information that needs to be passed as context arguments on types; the creation of

sub-contexts is a trade-off between precision and complexity. Sub-contexts are limited

in scope to their class of declaration. To objects that are part of the sub-context’s

representation, the sub-context looks like any other context parameter, while to the

contexts which contain the sub-contexts in their representation, they are no different


than any other context. This limits the scope of the changes required to introduce the

sub-contexts to the class scope.

The idea of sub-contexts has been presented previously by other authors. Aldrich and

Chambers proposed a type system called Ownership Domains which made extensive

use of sub-contexts in the form of user declared domains in which objects stored their

representations [3]. Clarke and Drossopoulou used them for other purposes in JOE to

provide more precise effect information [27].

3.4.3 Context Constraints

Programmers may know, when designing a class or method that a specific relationship

should hold between two or more context parameters. Capturing this information

as part of the program provides more information which could be used to statically

determine the disjointness of side-effects in terms of contexts. One way to allow the

programmer to specify these relationships would be to allow them to specify constraints

on the relationship between the two context parameters. So rather than allowing any

context parameter(s) to be supplied, the compiler would require the contexts supplied

to satisfy the stipulated relationship constraints or the program would be invalid and

would fail to compile.

Syntactically, these constraints could take a similar form to the statically enforced

constraints that can be applied to generic type parameters in languages like Java and

C]. I have chosen to use a constraint syntax inspired by that used by C] for its generic

type parameter constraints. I allow for four possible relationships to be stipulated

between two context parameters: dominates(>), dominated (<), dominated or equal

(<=), and disjoint (#). A context dominates another if it is directly or indirectly an

owner of the second context. A context is dominated by another if it is directly or

indirectly owned by the second context. Two contexts are disjoint if neither is part

of the other’s representation. The Listing 3.9 shows examples of this syntax. In the

listing a is stipulated to be disjoint from b and c is stipulated to be dominated by a.

Note that the same syntax is used for constraints on both classes and methods.


public class TestClass[owner, a, b, c] where a # b, c < a {...

}

Listing 3.9: An example of context constraint syntax on a class with context parameters.

3.4.4 Runtime Ownership Tracking

The statically enforced relationship constraints, introduced in the previous subsection,

allow programmers to document design decisions about the relationships between data

in their data structures. In practice, however, there are cases where the actual rela-

tionship between contexts is not known by the programmer or compiler before program

execution. Examples of this can be found in a number of common design patterns,

including iterator and visitor patterns. Most commonly, this problem occurs when

the contexts being tested are both context parameters, either on a class or method

declaration, without any context constraints. Any context could be supplied so the

relationship between the context parameters cannot be determined in the absence of

constraints. In cases like this, there are times when it would be desirable to be able

to test the relationship between context parameters at runtime. Such a runtime test

i needed to allow code to be conditionally parallelized based on possible context rela-

tionships. This conditional parallelization capability increases the amount of inherent

parallelism that can be successfully exploited by my proposed system. It is important

to note that, even if the code to be parallelized could touch millions of memory loca-

tions, a small fixed number of context tests can be used to quickly determine if these

memory locations are disjoint or not.

One of the simplest ways to implement a runtime ownership tracking system would be

to have each object in the system keep a pointer to its owner. In such a scheme the this

context of an object is represented as the object itself and, when present, subcontexts

are realized using objects. The ownership pointers added to objects would make it pos-

sible to traverse the ownership tree at runtime to determine the relationship between

contexts through pointer chasing. Runtime parallelism conditions most frequently in-

volve testing if two contexts are disjoint. To perform such a test at runtime using this

system, a walk from each of the nodes to the root (the world context) is performed.

The two contexts are disjoint if neither of the walks contain either of the contexts being


tested. If objects kept both a pointer to their owner and their depth in the ownership

tree, then testing for disjointness could be done in O(n−m) comparisons (where n and

m are the depths of the two contexts being compared in the ownership tree).

This simple implementation is O(1) per object in terms of memory complexity and

runtime object creation overhead. However, executing a context relationship test using

the representation is O(n) in terms of execution time complexity, where n is the height

of the ownership tree. Although most ownership trees tend to be relatively shallow

with a maximum height of approximately eight [1]. This means that in cases where

large numbers of objects are created relative to the number of disjointness tests, this

tracking system works well. In cases where the number of disjointness tests exceeds

the number of objects created, this ownership tracking system may not offer the best

performance.

The problem of tracking ownership and testing context relationships is analogous to

the type inclusion test problem described by Wirth [124] and efficiently solved by Co-

hen [30]. The type inclusion test problem is the problem of how to test if one type is a

subtype of another when compiling a program. Cohen’s solution to this problem was

to use Dijkstra views [39] to trade memory to reduce execution time complexity. This

solution can be applied to the problem of tracking and testing context relationships.

To use Dijkstra views to track ownerships, each object stores an array of pointers to

all of their ancestor contexts back to the root context, world. A disjointness test can

then be performed using the algorithm shown in Listing 3.10.

if |obj1.ancestors| == |obj2.ancestors|return obj1 != obj2

else if |obj1.ancestors| < |obj2.ancestors|return obj2.ancestors[|obj1.ancestors|] != obj1

elsereturn obj1.ancestors[|obj2.ancestors|] != obj2

Listing 3.10: The algorithm for testing if two contexts are disjoint. Each object hasa list of ancestor contexts which can be indexed into using []. The || operator has itsusual mathematical meaning of magnitude.

The Dijkstra views based algorithm provides an execution time complexity for testing

the relationship of O(1). This is at the cost of an object creation runtime complexity

of O(n) and a memory complexity of O(n), where n is the height of the ownership

tree. An intermediate trade-off would be possible if the ancestor array were replaced


with a skip list [96] or similar data structure which would provide execution time and

memory complexity of O(log n), where n is the height of the ownership tree. Table 3.1

summarizes the object creation execution time and memory overhead as well as the re-

lationship test execution time complexities for these three different methods of tracking

ownerships at runtime.

Implementation Object Creation Object Memory Relationship TestExecution Time Overhead Execution Time

Pointer Chasing O(1) (1) O(n)Dijkstra Views O(n) O(n) O(1)Skip Lists etc O(log(n)) O(log(n)) O(log(n))

Table 3.1: Table showing the runtime complexity of object creation and relationshiptesting. Note that n is the height of the ownership tree.

Given the overheads involved with such a runtime system, there would likely need to

be some kind of switch on the virtual machine to enable/disable the use of the runtime

tracking system so that the overheads incurred can be avoided in cases where exploiting

conditional parallelism is not worthwhile. When the runtime system is turned off, all

disjointness tests would fail and only parallelism shown to exist statically is exploited.

As future work, it may be possible to implement a system where the runtime tracking

of ownerships is limited to a few classes with the class loader containing intelligence to

decide which classes should have the ownership tracking enabled. This would provide

the best-of-both-worlds in that the system could exploit conditional parallelism, but

avoid overheads on sequential code to a large degree. The details of such a system are

beyond the scope of the work in this thesis, but are a direct and logical extension of

the work presented here in.

3.4.5 Stack Effect Disjointness

Determining the disjointness of stack effects is significantly less complicated than de-

termining the disjointness of heap effects. The stack and heap are treated as separate

disjoint memories by my system, as previously discussed. The simplicity of stack ef-

fects stems from their lack of a hierarchy. To process stack effects I rely on each stack

location in scope having a unique name. In languages which permit shadowing of local

variables, my system would require local variable names to be renamed so that they


are unique. With unique names for stack locations, two effects are disjoint if they do

not name the same local variable or parameter. To determine if two code blocks can

execute in parallel there are three tests to perform their stack effects,⟨r1, w1

⟩and⟨

r2, w2

⟩:

1. r1 ∩ w2 = ∅

2. r2 ∩ w1 = ∅

3. w1 ∩ w2 = ∅

Any overlap in these effects indicates a data dependency could exist between the two

code blocks and so my system conservatively concludes that they cannot be safely

executed in parallel.

3.5 Realizing the Sufficient Conditions for Parallelism

Having now realized the effect system proposed in Chapter 2 using Ownership Types, it

remains to map the sufficient conditions for parallelism proposed onto this effect system.

The previous chapter looked at three different parallelism patterns: task parallelism,

data parallel loops, and pipelining. This section is divided into three parts which look

at each of these parallelism patterns and refine the statements made in Chapter 2.

3.5.1 Task Parallelism

In Section 2.3, I stated that for two tasks to be safely executed in parallel, it was

sufficient to show that no flow, output, or anti-dependencies existed between the tasks

provided no control dependencies which would prohibit parallelization existed. It is

sufficient to show the following for each type of data dependency to show that it cannot

exist [15]:

• Flow Dependency the read effects of a statement S1 are disjoint from the write

effects of a statement S2;

• Output Dependency the write effects of a statement S1 are disjoint from the

write effects of a statement S2; and


• Anti-dependency the write effects of a statement S1 are disjoint from the read

effects of a statement S2.

3.5.2 Data Parallel Loops

In the data parallelism pattern, loop iterations execute independently and are dis-

tributed across multiple processors [9]. In Section 2.4.1.1 the sufficient conditions for

employing this parallelism pattern on foreach loops were developed. In Section 2.4.1.3,

I argued that the same sufficient conditions could also be used to apply this parallelism

pattern safely to the enhanced foreach loops I developed. In this section, these suffi-

cient conditions are refined in terms of the ownership types system.

The conditions originally presented in Section 2.4.1.1 were formulated by considering

a simple foreach loop of the form shown in Listing 3.11 and generalized using a con-

ceptual loop rewriting.

foreach (T element in collection)element.operation();

Listing 3.11: A simple stereotypical data parallel foreach loop.

The sufficient conditions developed for the parallelization of data parallel loops were:


parallelization,


to the loop body must have disjoint representations, and

• Loop Condition 3: the operation mutates only the representation of the ele-

ment on which it is invoked and it does not read the representation of any other

elements in the collection although it can read other, disjoint shared state.

Now using the ownership type and effect system I have presented in this chapter, these

conditions can be restated as:

• Loop Condition 2: all of the elements processed by the loop body must have

disjoint this contexts and must share the same owner, and


• Loop Condition 3: the operation has a write effect of at most this and all read

effects are either dominated by this or are disjoint from the owner of elements.

No writes to local variables declared outside the scope of the loop are permitted.

3.5.3 Pipelining

The data parallelism pattern for loop parallelization can be applied only to loops with-

out inter-iteration dependencies. Loop pipelining is a technique for staging the exe-

cution of loop iterations so that no one stream of execution needs to run an entire

iteration. This allows parallelism to be extracted from some loops which have inter-

iteration dependencies. In Chapter 2, I outlined an algorithm for computing the inter-

and intra-iteration dependencies for the statements in an arbitrary loop body provided

the data elements being processed are unique:


itself):

(a) Add an inter-iteration dependency if the two statements share a flow, out-

put or anti-dependence on the representation of any object other than the

data element being processed or any stack variable declared outside the loop

body’s scope.

(b) Otherwise, if the two statements are different, add an intra-iteration de-

pendency if the two statements share a flow, output or anti-dependence on

the representation of the data element being processed or any stack variable

declared inside the loop body’s scope.

Using my ownership type and effect system, this algorithm can be restated more for-

mally as:


itself):

(a) Add an inter-iteration dependency if either statement writes to a context

which is not dominated by the element’s this context or any stack variable

declared outside the loop body’s scope.


(b) Otherwise, if the two statements are different, add an intra-iteration depen-

dency if a flow, output, or anti-dependency exists between the two state-

ments via a context dominated by or equal to this context or any stack

variable declared inside the loop body’s scope.

The dependencies computed using this algorithm can then be used to construct a DDG

for a specific pipelining algorithm. This algorithm will be further formalized and proved

in Chapter 5.

3.6 Summary

In this chapter I have shown how the effects system described in Chapter 2 can be

realized using Ownership Types. Ownership Types provides a framework for capturing

and validating representation relationships between objects using context parameters.

Effects can be expressed in terms of these ownership contexts and these can in turn be

used to reason about possible data dependencies by determining the overlap between

different effect sets. To parallelize code, it is necessary to have a basis for determining

the disjointness of contexts and proving the absence of data dependencies. There are

some context relationships intrinsic to the construction of my system which can be

exploited, but to further facilitate reasoning about the disjointness of effects, I have

also added static context relationship constraints and an optional runtime ownership

representation. These techniques will be applied to the C] language in the next chapter

and this language specific implementation of my ideas will be used to establish the

viability of my approach through the application of my ideas to representative sample

applications in Chapter 7.

Chapter 4

Application to an Existing

Language

Chapter 2 presented the basic ideas and intuition behind my proposed system and in

Chapter 3, I realized these ideas using ideas from Ownership Types. This development

was originally undertaken targeting a C-style object-oriented language, but without

a specific language in mind. To validate my proposals, I need to apply them to a

representative set of sample applications. To be able to do this, I need to apply my

proposed system for capturing side-effects and reasoning about data dependencies to a

real programming language. This chapter focuses on this application and the specific

technical issues encountered in doing so.

This chapter focuses on the application of my ideas to the syntax and semantics of

a real, mainstream, commercial programming language; namely the safe subset of C]

version 3.0 (the reasons for this choice are discussed in Section 4.2). I do not consider

how to apply my effect system to C]’s unsafe code blocks, except to say that all unsafe

blocks are assumed to read and write world so that no parallelization involving that

part of the code is performed. Possible avenues of future work which might allow my

techniques to be extended to the unsafe subset of C] are in Section 9.3.4.2.

This chapter contributes the design of an extended version of C] version 3.0 I have

called Zal (Zal meaning “dawn” in Sumerian, the earliest known written language).

The details of a compiler for and the design choices made while implementing that

69

70 Chapter 4. Application to an Existing Language

compiler are the focus of Chapter 6. The focus of this chapter is on how to apply my

proposed system to all of the different syntactic and semantic features found in the C].

Ownership Types have traditionally been applied to the Java programming language

so there are a number of small technical contributions made through my discussion of

how to apply Ownership Types to these constructs.

4.1 Language Overview

As previously stated, Zal is an extension of the C] programming language with own-

ership and effect information. Zal consists of syntactic extensions to C] version 3.0, a

parallelizing compiler for the language, and a runtime ownership system. Figure 4.1

shows how these different components are used to compile a program and exploit its

inherent parallelism at runtime.

Figure 4.1: The different components that are used to compile a Zal program andexploit its inherent parallelism at runtime.

During program compilation, the Zal compiler applies the sufficient conditions for par-

allelism informally stated in Chapter 2, where appropriate. There are three possible

results:(1) the program fragment analyzed cannot be safely parallelized, (2) the program

fragment can always be safely analyzed, or (3) the program may be safely parallelized

depending on the relationships between context parameters at runtime. In the third

case, the compiler generates conditionally parallel code; at runtime the relationships

between contexts are established to determine if parallelism can be safely employed.

As an example of how Zal works and runtime context relationship testing, consider the

Chapter 4. Application to an Existing Language 71

hashtable example shown in Listing 4.1.

public class HashTable<K,V>[o,k,v] where K <: Object|k|where V <: Object|v| {private K[]|this| keys;private V[]|this|[]|this| values;public void Accept(Visitor<V>|v| visitor) reads <this,v> writes <v> {

foreach (Key|k| key : keys) {foreach (Value|v| value : values[key.HashCode])

visitor.Visit(value);}

}}

Listing 4.1: An example of a hashtable which implements a visitor interface whichallows the values to be traversed in parallel provided the k and v contexts are disjoint.

In the hashtable example shown in Listing 4.1, the hashtable has three ownership

parameters: the owner of the hashtable (o), the owning context of the hashtable keys

(k), and the owning context of the hashtable values (v). The traversal of the values can

only be safely undertaken when k is disjoint from v; otherwise, the Visit method could

modify the values of the keys and so disrupt the traversal of the hashtable. Listing 4.2

shows the conditionally parallel code generated by the Zal compiler for the hashtable

example shown previously in Listing 4.1.

[FormalContextParameters("o", "k", "v")]public class Hashtable<K,V> {

private IOwnership Context o;private IOwnership Context k;private IOwnership Context v;private K[] keys;private V[][] values;

[ReadEffects("this", "v"), WriteEffects("v")]public void Accept(Visitor<V> visitor) {

if ( Context k.IsDisjointFrom( Context v)) {Parallel.Foreach (keys, (Key key) => {

Parallel.Foreach (values[key.HashCode] ,(Value value) => visitor.Visit(Value));

});} else {

foreach (Key key : keys) {foreach (Value value : values[key.HashCode])

visitor.Visit(value);}

}}

}

Listing 4.2: The C] implementation of the hashtable example shown in Listing 4.1.


Zal also allows programmers to statically constrain the relationships between con-

text parameters on declarations. The compiler verifies that these static context con-

straints are satisfied. Adding the constraint where k # v to the list of constraints on

the hash table shown in Listing 4.1 would allow the compiler to generate only the

Parallel.ForEach implementation of the loop. These constraints restrict the context

parameters that can be accepted by a type or method, but allow the programmer to

capture design intent in their programs.

In Listing 4.2, notice how each of the context parameters has been translated into a

field; full details of this implementation of ownership tracking is provided in Chap-

ter 6. Also notice that the traversal of the hashtable’s values has been conditionally

parallelized. When the compiler undertakes parallelism analysis, it unconditionally

parallelizes as much of the program as possible. When unconditional parallelism is

not possible because of the relationship between two or more context parameters is

unknown, the compiler emits a conditionally parallelized implementation; the parallel

implementation is used only when it is safe to do so.

Zal’s ownership system is best described as a descriptive hierarchical ownership system

with effects. This means that in the Zal ownership system, each object has a single,

immutable owner and that the side-effects of executable definitions are expressed using

the ownerhsip system. This is broadly similar to the systems underlying MOJO [25]

and Deterministic Parallel Java [18] as well as the predecessors of these languages.

My proposed system also incorporates a number of ideas from a number of existing

ownership systems including JoE [27] and Universe Types [88].

The novel distinguishing features of Zal are:

1. more expressive context relationship constraint clauses

2. dynamic ownership tracking with runtime context relationship checks

3. the handling of a number of different syntactic features including:

(a) static fields

(b) user-defined value types

(c) inline function declarations (lambda expressions)


(d) LINQ expressions

The dynamic ownership tracking and the syntactic features highlighted above are novel.

Context relationship constraint clauses have previously been proposed in JoE [27]

and MOJO [25]. JoE’s ownership constraint clauses had only a domination opera-

tor (<) [27]. MOJO proposed a disjointness operator and an intersection operator, but

did not include the domination operator [25]. Zal, therefore, builds on a number of

existing ownership systems and adapts ideas from them for reasoning about possible

data dependencies in C].

4.2 Choice of Language

Before proceeding to discuss the details of applying Ownership Types to the C] pro-

gramming language, a discussion of the reasons for choosing it as the base language is

warranted. As was stated in Section 1.2, this thesis is focused on reasoning about par-

allelism in mainstream strong, statically-typed, imperative, object-oriented languages.

This focus limits the choice of language, but there were still several alternatives to

consider.

The first consideration was that this thesis is focused on developing techniques for

reasoning about parallelism which can be used by a large proportion of programmers

to write general purpose programs of varying sizes. This removed many specialist and

research languages, such as Scala, from consideration.

Secondly, as discussed in Section 3.3, my realization relied on a restricted stack refer-

encing model. This ruled out languages which do not employ a restricted stack model,

for example C++. This left Java, C], and Visual Basic.NET as the main languages for

consideration. Note that the “unsafe” portion of the C] language does not satisfy the

restricted stack referencing model adopted during the realization of the effect system.

Fortunately, unsafe code segments must be clearly identified in C] and require a spe-

cific compiler flag to be set. As a result, most C] programs do not make use of these

“unsafe” features and so they satisfy the adopted model. C] has the added advantage

of making it easy to explore expanding the system to less constrained stack models,


through its unsafe language features, in the future (see Section 9.3.4.1 for a discussion

of this future work).

Out of the three remaining languages (Java, C], and Visual Basic.NET), the best

studied in existing academic literature at the time was Java. The use of Java in the

literature means that there is a large collection of academic benchmarking suites. Own-

ership type systems have generally been validated using extended versions of the Java

programming language. Java also has the advantage of having several well studied,

free, open-source implementations available.

Despite Java’s advantages, it also has some serious disadvantages. First, and most

serious, is the retrofitted generics implementation in Java. Generics allow user-defined

types to be parameterized with additional types so that objects can be specialized

at creation time. When generics were added to the Java programming language, the

designers chose to maintain backwards compatibility with previous versions of the lan-

guage in the virtual machine. This meant that the generics were implemented using

type erasure; the generic type information is erased by the compiler and it is not avail-

able at runtime. This is not ideal because the ownership type parameters required

by my realization would be implemented using the same infrastructure as the generic

type parameters. As discussed in Section 3.4.4, a runtime ownership tracking system

is necessary to help facilitate reasoning about the disjointness of contexts.

C] and Visual Basic.NET, unlike Java, both persist generic type information through

to the virtual machine at runtime. The official compilers for these languages are not

publically available, but the language specifications are freely accessible. C] does not

have the same support in the academic community as Java and so finding benchmarks

and other similar tools is more difficult, but there is a large number of programs written

in C]. The similarity of the core syntax used by Java and C] also allows for relatively

easy porting of existing Java programs to C].

Ultimately, I decided to work with C] because it offers a number of advanced language

features, such as LINQ and user-defined value types, not found in Java and because

it persists generic type information through to runtime. C] also boasts a familiar

syntax and a large following of programmers writing general purpose applications.

The existing C] expertice within the Microsoft QUT eResearch Centre and QUT’s


support for research involving the language also influenced my choice of language. My

system could have been applied just as well to Java or Visual Basic.NET. My extended

version of the C] version 3.0 language developed as part of this thesis, Zal, is presented

informally in the remainder of this chapter and formalized in the next.

4.3 Syntactic Features

Ownership Types research has traditionally been undertaken using the Java program-

ming language. The C] programming language used in this project has a number of

syntactic constructs not found in Java. There is no literature documenting the ap-

plication of Ownership Types to C] itself or to some of its more advanced language

features.

All of the basic syntactic changes proposed in Chapter 3 can be applied directly to the

C] language. This section focuses on the annotation of the more advanced syntactic

language features, not previously discussed, with ownership and effect information.

4.3.1 Basic Syntax

In Chapter 3 I discussed how my proposed effect system could be realized using own-

ership types. This discussion focussed on classes and methods, the core constructs of

most object-oriented languages. In the course of this discussion I presented several dif-

ferent code samples using a syntax similar to that of Java and C]. The syntax presented

in these sections was not extensively discussed and the code snippets were primarily

used to provide clarifying examples for the discussion. Having decided to apply my

proposed type and effect system to C], I begin by revisiting changes required to classes

and methods to implement my proposed type and effect system. The remainder of this

chapter discusses the syntax and semantics of Zal.

4.3.1.1 Class-Level Context Parameters

Classes, in an ownership system, are parameterized with formal context parameters.

The first formal context parameter is special and is referred to as the class’s owner. An


object’s owning context specifies which object’s representation it is part of. The other

formal context parameters are all used to construct types within the class definition.

In Zal, a method’s formal context parameters are listed as part of the class definition

immediately after the class name and any generic type parameters. To facilitate parsing,

I have chosen to delimit the list of formal context parameters using []s. Listing 4.3

shows a class parameterized with several different formal context parameters.

public class Example[owner, a, b] {...

}

Listing 4.3: An example of a class parameterized with formal context parameters owner,a, and b.

When a type is named, actual context parameters must be supplied for the formal

context parameters. The actual context parameters may be the special contexts this

(the context storing the representation of the current object) and world (the special

top context) or a context parameter currently in scope. The actual context parameters

on a type reference are listed between ||s immediately after the type name and any

generic type parameters. Listing 4.4 shows the naming of the type of a field within a

class using actual context parameters.

public class Example[owner, a, b] {public Example|this, world, a| next;

}

Listing 4.4: An example of a field with actual context parameters this, world, and b.

Last, but not least, when a class with context parameters is extended, actual context

parameters must be supplied on the type reference just as is done with generic type

parameters. An example of this is shown in Listing 4.5.

public class Parent[owner, a] { ... }public class Child[owner, data, source] : Parent|owner, source| { ... }

Listing 4.5: An example of a class extending a class which is parameterized with contextparameters.

4.3.1.2 Method-Level Context Parameters

Like some previous ownership systems [27, 25, 49], I allow method to be parameterized

with formal context parameters. Unlike the class formal context parameters, there is


no special meaning attached to the first context parameter declared on a method. The

context parameters on a method allow the method to construct types that cannot be

constructed using the containing class’s context parameters alone. A method’s formal

context parameters, if any, are listed between []s following the method name and any

generic type parameters. An example of a method with formal context parameters

is shown in Listing 4.6. When a programmer invokes a method with formal con-

public class Example[owner, a] {public void method[data1,data2](Object|data| dataObj1,

Object|data2| dataObj2) {...

}}

Listing 4.6: A method definition with formal context parameters.

text parameters, they supply the actual context parameters explicitly at the call site.

The actual context parameters are listed in the same positions as the formal context

parameters, but between ||s.

When invoking a generic method in C], the compiler can often infer the type param-

eters from the types of the actual parameters supplied at the call site. Listing 4.7

shows an example of a generic method to concatenate the string representations of

two parameters which makes use of this type parameter inference. The same inference

techniques could be used to infer the actual context parameters to a method when

actual context parameters are not supplied by the programmer. This would reduce the

number of annotations that need to be supplied by the user without reducing the flex-

ibility or expressiveness of the system. The prototype implementation of Zal described

in Chapter 6 does not currently support such context parameter inference, but adding

this support is only a matter of additional engineering work in the compiler.

public string Concat<T>(T a, T b) {return a.ToString() + b.ToString();

}...int val1 = 1, val2 = 2;string result2 = Concat(val1, val2); // the int type parameter is inferred

Listing 4.7: An example of the inference of type parameters to a generic method.


4.3.1.3 Context Constraints

Reasoning about parallelism requires reasoning about the disjointness of side-effects. In

my proposed effect system which uses ownership types, reasoning about the disjointness

of effects requires some basis for reasoning about the relationships between two contexts.

As was discussed in Section 3.4.1, there may be times when a designer of a class or

method knowns that a specific relationship should hold between two or more context

parameters. Capturing this information provides more information which can be used

to help determine the relationships between context parameters.

In Zal, definitions which are parameterized by formal context parameters may have one

or more constraint clauses describing the relationships amongst the declared context

parameters and between the declared context parameters and any other contexts in

scope. C] uses where clauses to put constraints on generic type parameters and for

consistency, I have adopted a similar notation for context parameters.

There are four context relationships, shown in Table 4.1, which can be stipulated be-

tween two context parameters in a constraints clause. These for context relationship

constraints are based on the relationships between context which, when established,

are sufficient to facilitate the safe exploitation of inherent parallelism as was discussed

in Sections 3.4 and 3.5. The nested hierarchy of contexts in the ownership system used

by Zal results in contexts being either nested inside one another or disjoint from one

another. The relationships in Table 4.1 allow these relationships to be asserted by the

programmer.

Relationship Descriptiondominates (>) The context on the right is part of the representation of

the context on the left.dominated (<) The context on the left is part of the representation of

the context on the right.dominated or equals (<=) The context on the left is either part of the representa-

tion of the context on the right or is the same contextas that on the right.

disjoint (#) The two contexts are in different branches of the own-ership tree.

Table 4.1: The four context relationships which can be stipulated in a Zal contextconstraint clause and their meanings.

When supplying context constraints on a definition with formal context parameters,


the constraints appear at the end of the definition immediately before the opening {

of the definition body or the terminal ; just like a generic type parameter constraint

in C]. Each constraint relationship is listed in its own where clause as is done with

generic type parameters. Listing 4.8 shows an example of this constraint syntax on a

class definition.

public class ConstraintExample[owner, a, b, c] where a # b where c <= a {...

}

Listing 4.8: An example of a class annotated with context parameters and contextconstraints using Zal’s syntax.

Note that when dealing with nested definitions, like a method inside a class, at least

one of the two contexts in a constraint must be in the list of formal context parameters

declared on the definition of the method. For example, a method cannot specify a

constraint between two of its containing class’s context parameters.

When actual parameters are supplied for formal context parameters with stipulated

constraints, the compiler tries to statically verify that the constraint is satisfied. If the

compiler cannot determine that the constraint is satisfied, the program is considered

invalid and will not compile.

4.3.1.4 Method Effect Declarations

In Section 3.3, I discussed the need to declare effects on method definitions. The main

reasons for this was to facilitate reasoning about method side-effects in the presence of

overriding. By having overriding and overridden methods declare their effects they can

be checked easily to make sure that the overriding method’s effects are a subset of the

side-effects of the method being overridden. The method body of a method with a set

of declared effects must not exceed the effect declaration, that is the method body’s

overall effects must be a subset of those declared.

In C], unlike Java, methods are not virtual by default. Only methods which are explic-

itly marked with the abstract, override, or virtual keywords in their declaration

can be overridden. This means that methods which are not defined to be abstract,

virtual, or override do not need to declare their effects; the effects can simply be


computed from the method body. Effects may still be declared on methods when not

required and, where supplied, the effect of the method body will be validated against

the declared effects. By not requiring effect declarations on all methods, the number of

annotations which need to be added to a program is reduced which makes the system

easier to use. To ensure effects can still be calculated even when the method source is

not available, for example when compiling against a Dynamic-Link Library (DLL), the

effects of method are stored as part of the method’s signature after compilation regard-

less of whether they were explicitly declared or not. Allowing effect declarations to be

selectively omitted serves to reduce the programmer’s annotation burden. However, to

facilitate parallelization in the presence of code reuse, the use of effect declarations on

public Application Programmer Interfaces (APIs) would be necessary and code analysis

tools could be used to help enforce this.

Data dependencies exist only when there is an update to shared mutable state; over-

lapping access to unmodified shared mutable state is safe and does not result in a data

dependency. I have, therefore, decided to list the sets of contexts read and written sepa-

rately to reduce the number of false-positive possible data dependencies being detected

by the system. If I chose to amalgamate the read and write effects into a single effect

declaration then overlapping reads would be detected as possible data dependencies.

The effect sets are listed on the method definition after the formal call parameters and

before any context constraints. An example of the syntax is shown in Listing 4.9.

public class MethodExample[owner] {public virtual void operation[a, b, c]() reads <this,a,b> writes <c>

where a # b {...

}}

Listing 4.9: An example of an instance method annotated with effect declarations andcontext constraint clauses.

Default Effects

If a method invoked from a program written in Zal has no effect information, for

example a method written in C] then the compiler assumes that the method reads and

writes the world context — anything on the heap. This is the safest assumption as it

ensures no dependencies are possibly violated, but hinders parallelization.


4.3.2 Subroutine Constructs

The discussion of how to realize my effect system in Chapter 3 discussed instance

methods only on objects. C], like many other languages, has a number of additional

syntactic constructs to define subroutines. Like methods, these mechanisms need to be

annotated with effect information when they are declared to be overriding or virtual.

These effect annotations can be used to ensure effect consistency in the presence of

overriding. The following subsections discuss C]’s subroutine abstractions and their

annotation with ownership and effect information.

4.3.2.1 Properties

C] properties are syntactic sugar designed to simplify the task of writing and using

simple object getter and setter methods. The property is used like a field and the

compiler handles converting the code to call the property’s get or set method based

on how the property is used. Listing 4.10 is an example of an Employee class with a

firstName field that is made readable and writeable through the FirstName property.

public class Employee{

private string firstName;public string FirstName{

get {return firstName;

}set {

if (value == "")throw new InvalidOperationException();

firstName = value;}

}}

Listing 4.10: An example of a property being used to read and write a field.

Listing 4.11 shows how the property in Listing 4.10 would be implemented using meth-

ods.


public class Employee{

private string firstName;

public string getFirstName() {return firstName;

}

public string setFirstName(string value) {if (value == "")

throw new InvaldOperationException();firstName = value;

}}

Listing 4.11: Code snippet which shows how the properties in Listing 4.10 could beimplemented using methods.

While properties are syntactic sugar for accessor and mutator methods, the C] lan-

guage specification does not allow properties to have either generic type parameters or

call parameters other than the implied value parameter for the setter. Like methods,

properties can be overridden when the original definition was marked virtual.

As was discussed in Section 4.3.1.4, programmers are required to declare the read and

write effects of overriding and virtual methods to facilitate effect consistency checking

during overriding. I made the design decision to add ownership and effect information to

properties in a manner consistent with the existing C] design decisions. To be consistent

with the lack of generic type parameters on properties, I decided not to allow properties

to be parameterized with formal context parameters. I also decided that the effects of

reading or writing a property must be declared if the property is declared to be abstract,

override, or virtual. This ensures that the effects of the accessors can be kept consistent

during overriding. Effect consistency during overriding means that the effects of the

overriding get accessor do not exceed the declared effects of the overridden get accessor

and the same for the set accessor. In non-virtual properties, the effect declarations

are optional as the effects can be computed from the accessor bodies and the property

cannot be overridden. In the Zal syntax, these declarations take the form of read and

write effect sets attached to the property’s get and set accessors, where present.

Listing 4.12 shows the syntax chosen for accessor effect declarations. When effect

declarations are supplied, it is necessary to ensure that the accessor body effects are

consistent with their declared effects and the effects of any accessors overridden just as


it is with methods as was previously discussed.

public class Employee[owner]{

private string firstName;

public string FirstName{

get reads <this> writes <> {return firstName;

}set reads <> writes <this> {

if (value == "")throw new InvalidOperationException();

firstName = value;}

}}

Listing 4.12: An example of a property annotated with read and write effects.

Automatic Properties

To further reduce the amount of code needed to write simple properties which just

expose an underlying field, C] version 3.0 introduced automatic properties [78]. With

an automatic property, the field the property exposes becomes implicit rather than

explicit as shown in Listing 4.13.

public class Employer{

public string FirstName{

get;set;

}}

Listing 4.13: An example of an automatic property which does not require an explicitfield or accessor implementations.

The implementation details of automatic properties are well known and defined in the

language specification and so effect declarations are not required for automatic prop-

erties, even if they are declared to be abstract, override, or virtual. All automatic

properties have the following read and write effects depending on whether they are

instance properties or static properties:


• get - instance: reads this and writes nothing

static: reads the current class’s static context (see Section 4.4) and

writes nothing

• set - instance: reads nothing and write this

static: reads nothing and writes the current class’s static context

(see Section 4.4)

Omitting the effect declarations on automatic properties maintains the simplicity of

the syntax (one of the reasons it was added to the language) without losing any effect

precision; the effects can be inferred from the lack of an accessor implementation. It

is important to note that even though the automatic properties do not carry effect

declarations, their effects must still be consistent in the presence of overriding.

4.3.2.2 Indexers

Indexers are another piece of C] syntactic sugar which are designed to allow access to

an object’s state using an array index style syntax with any number and type of indices.

Indexers are written in a very similar style to properties using get and set accessors.

Listing 4.14 shows an example of an indexer. This indexer takes a day and a calendar

and uses that to convert the day into a number.

public class Calendar {public int firstDayOfWeek = 0;

}

public class Week {public string[] days = {"Sunday", "Monday", "Tuesday", "Wednesday",

"Thursday", "Friday", "Saturday" };

public int this[string day, Calendar cal] {get {

int dayIndex =Array.LastIndexOf(days, day) - cal.firstDayOfWeek;

return dayIndex < 0 ? dayIndex + 7 : dayIndex;}

}}

Listing 4.14: An example of an indexer used to convert day names into numerical daysof the week from a defined starting point.

As was the case with properties, the accessors in abstract, override, or virtual indexers


are required to have effect declarations to ensure consistency during overriding. Acces-

sors in non-virtual indexers may declare their side-effects, but the effect declaration is

not required as the effects can be inferred from the accessor’s implementation.

Like properties, indexers cannot be parameterized with generic type parameters. In

deciding whether to allow indexers to be parameterized with context parameters, I

have again chosen to remain consistent with the existing C] language. As a result,

Zal does not allow indexers to be parameterized with context parameters. Listing 4.15

shows the syntax for both declaring and using an indexer with context parameters.

public class Calendar[owner] {public int firstDayOfWeek = 0;

}

public class Week[owner] {public string[]|this| days = {"Sunday", "Monday", "Tuesday",

"Wednesday", "Thursday", "Friday", "Saturday" };

public int this[string day, Calendar|owner| cal]reads <owner> writes <> {get {

int dayIndex =Array.LastIndexOf(days, day) - cal.firstDayOfWeek;

return dayIndex < 0 ? dayIndex + 7 : dayIndex;}

}}

Listing 4.15: An example of how to annotate an indexer with context parameters andeffects. The classes annotated were previously shown in Listing 4.14.

4.3.2.3 Delegates

In C], delegate objects represent methods; they are essentially a type-safe function

pointer. C] allows delegates to be returned from methods, passed as parameters, and

stored on the stack or in the heap as with any other data, thus making methods first-

class citizens of the language. When wrapping a method in a delegate and passing it

around, it is necessary to carry the call parameter and return type information as part

of the delegate’s type. Therefore, a delegate is, essentially, a typed function pointer.

Listing 4.16 shows the C] syntax for declaring a delegate type called BinaryOp taking

two integers and returning an integer.

When a method is assigned to a delegate, C] ensures that the method’s types are


public delegate Object BinaryOp(Object objA, Object objB);

Listing 4.16: The syntax of a delegate taking two Objects and returning an Object.

consistent with the delegate’s types. The method’s return type is allowed to vary

covariantly and the method’s parameter types are allowed to vary contravariantly. So

in the case of the BinaryOp delegate shown in Listing 4.16 the method assigned to the

delegate would have to accept two Object parameters, but could return any type since

all types are subtypes of object. Trying to assign a delegate that accepted only int

parameters would fail. This ensures type correctness is preserved when delegates are

used.

Delegate type declarations, as with all other type declarations in the C] language,

can have type parameters and associated constraints on those parameters. When a

delegate type is named, the compiler must verify that the type parameters, if any, are

supplied and that the actual types supplied satisfy the type constraints on the delegate

declaration.

With the addition of effects to method signatures (see Section 3.3), it becomes necessary

to carry read and write effects with the delegate’s type to facilitate dependency analysis.

The context parameters for the parameter and return types of the delegate may not be

known when the delegate is declared, and so it is necessary to add context parameters

to the declaration just as for methods. Listing 4.17 shows a version of the BinaryOp

delegate originally shown in Listing 4.16 annotated with context parameters and side-

effects. Delegate declarations may also have context constraint clauses which must be

satisfied when an instance of the delegate type is named.

public delegate Object|C| BinaryOp[A,B,C](Object|A| objA, Object|B| objB)reads <A,B> writes <>

Listing 4.17: An example of the use of context parameters and effect declarations onthe delegate originally shown in Listing 4.16.

When a delegate type is named, actual contexts must be supplied for the formal context

parameters, if any. Further, the supplied context parameters must satisfy any declared

context constraints on the delegate otherwise the delegate type is invalid.

When a method is assigned to a delegate the input parameter types and return types


must be checked for compatibility. Just as with generic type parameters, context pa-

rameters must be equal for the types to be compatible as was discussed in Section 3.2.3.

While the types themselves are allowed to vary as previously discussed, the context pa-

rameters are not, as allowing this would open a hole in the type system. Further,

the effect declarations of the delegate and the method being assigned to it need to be

checked to ensure that the effects of the method’s declared effects are a subset of the

effects of the delegate. In this way, the delegate effects can be taken as the maximum

effects of any method assigned to it and it is not necessary to resolve which specific

method the delegate may invoke.

Finally, the reading and writing of variables of delegate type is treated like the read

or write of any other variable and produce the appropriate side-effects as discussed in

Section 3.3. Invoking a delegate stored in a variable, causes a read of the variable in

addition to the read and write effects of the delegate itself.

4.3.2.4 Events

In software systems, there is frequently a need for parts of the application to be notified

of changes in the state of the system caused either internally or externally. For example,

in a program with a GUI, one or more pieces of code may want to be notified when a

particular button has been clicked or a particular option selected. The changes in the

state of the system are called events and many software patterns make use of them.

C] provides syntactic support for events. In C] an event is a list of delegates. An

event can be fired using method invocation-style syntax. When a delegate is invoked,

all of the delegates in the event list are invoked sequentially with the supplied actual

parameters. Delegates can be added and removed from the list of observers. Once a

change in state occurs, all of the methods stored in the event list can be invoked from a

single line. An example of an event is shown in Listing 4.18; the invoke of listeners will

invoke all of the methods which have been stored in the listeners list of delegates.

Events cannot be aliased and so no direct changes to the event declaration syntax itself

are required. The changes to delegate types, namely the addition of context parameters

and effect declarations are indirectly consumed by the event construct. Event fields,


public class EventSource {public delegate void EventHandler(EventSource source);public event EventHandler listeners;

public void fireEvent() {listeners(this);

}}

public class Subscriber {public Subscriber(EventSource target) {

target.listeners += eventListener;}

public void eventListener(EventSource source) {... }}

Listing 4.18: An example of a simple C] event using the delegate type EventHandler.

like other fields, may be either static or instance fields. Invoking the methods stored in

an instance event field causes a read of the containing object’s this context. Similarly,

invoking the methods stored in a static event field causes a read of the containing

class’s static context. Adding to or removing methods from an event causes a write of

the containing object’s this context for instance fields and the containing class’s static

context for static events. Listing 4.19 shows the same event example previously shown

in Listing 4.18, but this time annotated with context parameters and effect declarations.

public class EventSource[owner,observer] {public delegate void EventHandler(EventSource|owner| source)

reads <this,observer> writes <observer>;

public event EventHandler listeners;

public void fireEvent() reads <this,observer> writes <observer> {listeners(this);

}}

public class Subscriber[owner,tgtOwner] {public Subscriber(EventSource|tgtOwner, this| target)

reads <tgtOwner> writes <tgtOwner> {target.listeners += eventListener;

}

public void eventListener(EventSource|tgtOwner, this| source)reads <tgtOwner,this> writes <this> { ... }

}

Listing 4.19: An example of a simple C] event previously shown in Listing 4.18 nowannotated with context parameters and effect declarations.


4.3.2.5 Anonymous Methods

Anonymous methods were introduced in C] version 2 and are simply methods without

a name. The method is declared inline with the code and is assigned to a delegate for

use elsewhere in the program. Common uses for anonymous methods in C] include the

installation of event handlers.

Anonymous methods operate in a similar manner to normal methods. If the anonymous

method needs to have access to input parameters then these must be declared as part of

the anonymous method’s signature and these must be compatible with the parameters

of the delegate the method is being assigned to. The return type of the anonymous

method is inferred from the delegate the anonymous method is being assigned to and

the method body validated against that inferred type. This is to say that the body of

the anonymous method is validated against the declared signature of the method which

is a combination of explicit parameters and an inferred return type. Listing 4.20 shows

an example of two anonymous methods which are assigned to the add and sub fields of

the simple calculator.

public class RPNCalc {public delegate int BinaryOp(int a, int b);public static readonly BinaryOp add =

delegate(int a, int b) {return a + b;

};public static readonly BinaryOp sub =

delegate(int a, int b) {return a - b;

};

Stack<int> values = new Stack<int>();

public void Push(int value) {values.Push(value);

}

public void ApplyOp(BinaryOp op) {value.Push(op(values.Pop(),values.Pop()));

}}

Listing 4.20: An example of a simple Reverse Polish Notation calculator which definesbinary options to be applied to the stack as a delegate. The calculator supplies twooperations, add and sub, via anonymous method declarations.

The syntax and semantics of the anonymous method declaration operate much like a


method declaration with the supplied method body being check for correctness against

the signature. Anonymous methods cannot be overridden; they cannot be named. As

is the case with non-virtual methods, it is not necessary to declare the method’s side-

effects as part of the method signature — the effects can be computed by the compiler.

If an effect declaration is made, then the declared effects are checked against the actual

side-effects of the anonymous method’s body — the body effects must be a subset

of those declared. Delegate type compatibility is performed as previously discussed

when the anonymous method is assigned to a delegate. As with standard methods, the

anonymous method may have context parameters. The previous calculator example is

repeated with the addition of context parameters and effect declarations in Listing 4.21.

public class RPNCalc[owner] {public delegate int BinaryOp(int a, int b) reads <> writes <>;public static readonly BinaryOp add =

delegate(int a, int b) reads <> writes <> {return a + b;

};public static readonly BinaryOp sub =

delegate(int a, int b) reads <> writes <> {return a - b;

};

Stack<int>|this| values = new Stack<int>|this|();

public void Push(int value) reads <this> writes <this> {values.Push(value);

}

public void ApplyOp(BinaryOp op) reads <this> writes <this> {value.Push(op(values.Pop(),values.Pop()));

}}

Listing 4.21: An example of an ownership annotated Reverse Polish Notation calcu-lator class based on the original C] example shown in Listing 4.20. Note the effectdeclarations added to the anonymous methods.

Outer Variables

Anonymous method bodies are allowed to refer to local variables and fields defined

outside the scope of the anonymous method’s body. The local variables from the code

surrounding an anonymous method which are used in the anonymous method are called

outer variables. C] implements full lexical closures, but changes to the local variables

made by the anonymous method can be observed by the surrounding code and vice


versa. Listing 4.22 shows an example of an anonymous method returned from the

operation method which captures the outer variables i and j.

public class OuterVarEx {public delegate int Op();public Op operation() {

int i = 20;{

int j = 30;return delegate { i + j; };

}}

}

Listing 4.22: A code listing showing the capture of local variables from two differentscopes in the anonymous method returned from the operation method.

The C] compiler implements outer variables by extracting them into private inner

classes based on their declaration scope. Anonymous methods are turned into methods

on these private inner classes, again based on the scope of the anonymous method’s

declaration. Listing 4.23 shows how the C] compiler would implement the outer vari-

ables and anonymous method shown in Listing 4.22. Note that the compiler would use

a different naming scheme, but the example serves to demonstrate the transformation

employed.

public class OuterVarEx {public delegate int Op();private class OperationScope1 {

public int i;}private class OperationScope2 {

public OperationScope1 outerScope;public int j;public int AnonMethod1() {

return outerScope.i + j;}

}public Op operation() {

OperationScope1 scope1 = new OperationScope1();scope1.i = 20;{

OperationScope2 scope2 = new OperationScope2();scope2.outerScope = scope1;scope2.j = 30;return new Op(scope2.AnonMethod1);

}}

}

Listing 4.23: An example showing how the C] compiler would implement the exampleshown in Listing 4.22 using private inner classes.


This transformation of outer variables into fields of an inner class means that a read

of a local variable which would have originally incurred only a local stack effect must

now cause a heap effect as well. This means that a data flow analysis needs to be

performed to identify which local variables are actually outer variables. Once outer

variables have been identified, a read of that local variable must generate a read of the

this and assignment to the local variable must generate a write of the this regardless

of whether that read or write is in the anonymous method or not. This ensures that

even if the anonymous method is invoked from another part of the program, the effect

on the captured local variables will be contained in the effect sets and the appropriate

data dependencies generated. Listing 4.24 shows how the implementation shown in

Listing 4.23 would be annotated with context parameters and effect declarations. List-

ing 4.25 shows how the original anonymous method would be annotated with context

parameters and effect declarations.

public class OuterVarEx[owners] {public delegate int Op() reads <this> writes <>;private class OperationScope1[owner] {

public int i;}private class OperationScope2[owner] {

public OperationScope1|owner| outerScope;public int j;public int AnonMethod1() reads <owner> writes <> {

return outerScope.i + j;}

}public Op operation() reads <this> writes <> {

OperationScope1|this| scope1 = new OperationScope1|this|();scope1.i = 20;{

OperationScope2|this| scope2 = new OperationScope2|this|();scope2.outerScope = scope1;scope2.j = 30;return new Op(scope2.AnonMethod1);

}}

}

Listing 4.24: An example showing how the implementation of the example shown inListing 4.22 would be annotated with context parameters and effect declarations.


public class OuterVarEx[owner] {public delegate int Op();public Op operation() reads <this> writes <> {

int i = 20;{

int j = 30;return delegate { i + j; };

}}

}

Listing 4.25: The outer variable capture example from Listing 4.24 annotated withcontext parameters and effect declarations.

4.3.2.6 Lambda Expressions

Lambda expressions are similar to anonymous methods in that they are another means

of declaring anonymous inline executable code blocks. They are used in the same way

as anonymous methods, but differ significantly in the semantics of how they are checked

for correctness by the compiler.

The bodies of anonymous methods are checked by using the declared parameter infor-

mation. That is, the method body is validated before the assignment of the method

to the delegate is validated. Lambda expressions, on the other hand, do not require

parameter types to be specified, the types of the parameters are inferred based on

the delegate the method is assigned to. Only after the type information for the sig-

nature is computed can the lambda expression’s implementation be validated. The

flow of type information is reversed in lambda expressions compared with anonymous

methods. Listing 4.26 shows an invalid anonymous method and an invalid lambda ex-

pression. The anonymous method fails to compile because the int parameter cannot

be converted to a short. The lambda fails to compile because it is bound to delegate

which accepts an int parameter, it would compile if the delegate accepted a short.

This change to the order in which lambda expressions are type checked relative to

other subroutine definitions has a significant impact on how the context parameters

and effect declarations are added to lambda expressions. The details of the methods a

lambda expression invokes may not be computable until the expression is bound to a

delegate type and the types of the lambda expression’s parameters become known. Be-

cause of this, it is incorrect to ask the programer to declare the side-effects of a lambda


public class AnonVsDelegate {public bool IsZero(short value) {return value == 0; }

public delegate string NumToStr(int value);

public NumToStr anonMethod =delegate(short i) { return IsZero(i) ? "Empty" : i.ToString(); }

public NumToStr lambda =i => IsZero(i) ? "Empty" : i.ToString();

}

Listing 4.26: An example code snippet showing the difference in the typing of anony-mous methods and lambda expressions. The anonymous method fails to compile be-cause the int i parameter cannot be implicitly converted to a short while the lambdafails to compile because of the delegate it is bound to taking an int parameter; it wouldsucceed if the delegate took a short as a parameter.

expression. The effects can, however, be computed as part of the lambda’s implemen-

tation validation. This effect information can then be checked for consistency against

the declared effects of the delegate. Further, because the programmer does not have

to specify the types of the delegate’s parameters it is incorrect to ask the programmer

to list a set of context parameters for the method. Context information is received

from the delegate the lambda expression is being bound to. This means that there

are no syntactic changes required for lambda expressions, but the compiler needs to do

more work to carry context parameter information into the validation of the lambda’s

implementation. It must also compute the effect of the lambda expression after the

body has been validated and ensure the effects are a subset of those declared by the

delegate. Listing 4.27 shows an example of a calculator used previously to demonstrate

the annotation of anonymous methods. The lambda expression syntax is not modified

because the allowable effects are obtained from the delegate to which the lambda is

assigned to maintain consistency with the semantics of the lambda expressions.

It is important to note that because lambda expressions are declared inline and passed

through the program using delegates, there is no way for them to be overridden. This

means that the lack of an effect declaration does not impact on the correctness of effects

in the presence of overriding as was the case with anonymous methods.

Lambda expressions, as with anonymous methods, may capture local variables, thus

turning them into outer variables. The compiler uses the same techniques to implement

these outer variables and so the same effect computation procedures as were outlined

for outer variables associated with anonymous methods must be followed.


public class RPNCalc[owner] {public delegate int BinaryOp(int a, int b) reads <> writes <>;public static readonly BinaryOp add = (a, b) => a + b;public static readonly BinaryOp sub = (a, b) => a - b;

Stack<int>|this| values = new Stack<int>|this|();

public void Push(int value) reads <this> writes <this> {values.Push(value);

}

public void ApplyOp(BinaryOp op) reads <this> writes <this> {value.Push(op(values.Pop(),values.Pop()));

}}

Listing 4.27: An example of an ownership annotated Reverse Polish Notation calculatorclass based on the original C] example shown in Listing 4.20 with the anonymousmethods replaced by lambda expressions.

4.3.2.7 Extension Methods

C] version 3.0 introduced extension methods as a mechanism to inject methods on to

existing types without having to modify or extend the original type definition. Exten-

sion methods are implemented as static methods, but appear to be instance methods

on the type of the extension parameter. Listing 4.28 shows an extension method dec-

laration. The first parameter of the method is preceded by the this keyword which

denotes the static method as an extension method on C]’s builtin string type which

is an alias for the System.String type in the .NET Base Class Library.

public static class StringExtensions {public static int WordCount(this string str) {

string[] words = str.Split(’ ’);return words.Length;

}}

Listing 4.28: An example of an extension method which adds a WordCount method tothe interface of string.

The side-effects for an extension method could be computed like those for any other

static method. However, this would not fully exploit the benefits of having the extension

method appearing to be an instance method on the type of the extension parameter.

Instead, it would be more consistent to analyze the extension method as if it were a

method on the type of the extension parameter. This means that reads or writes of

the extension parameter’s representation would, therefore, generate a read or write of


this. Listing 4.29 shows the extension method shown in Listing 4.28 with the addition

of context parameters and effect declarations. Note how reads of the str variable cause

reads of the this context.

public static class StringExtensions {public static int WordCount(this string str) reads <this> writes <>

reads <this> writes <> {string[]|this| words = str.Split|this|(’ ’);return words.Length;

}}

Listing 4.29: An example of an extension method annotated with context parametersand effect declarations showing reads of the extension parameter generating reads ofthis.

4.3.3 Types

In Chapter 3, the discussion of how to apply context parameters to data types was

limited to standard object-oriented classes used to produce heap allocated objects.

The C] programming language provides a number of other data types, some with

quite different semantics from classes, which also need to be annotated with context

parameters. In this section I discuss the application of Ownership Types to these types.

4.3.3.1 Ref and Out Call Parameters

Java employs a strict pass-by-value semantics for method actual parameters. C] em-

ploys the same pass-by-value semantics, but also supports two additional parameter

passing modes denoted on parameters using ref and out. A ref parameter is passed

by reference, thus allowing the original source of the value to be updated by the method.

An out parameter is the same as a ref parameter except that it is allowed to be unini-

tialized and so the parameter must be assigned a value in the method body before it

is used. Expressions used to compute actual ref and out parameter values must be

assignable expressions. Obviously, these different passing conventions have an impact

on the effect computation strategy outlined in Section 3.3.

Fortunately, the ref and out keywords identify which parameters may be assigned to by

a method. The use of one of these keywords identifies that the parameter is passed by

reference, rather than by value as is usually the case. Passing by reference means that


the caller can “see” any changes made by the method to the variables supplied. These

keywords document the read and write effects possible with these special parameters

and so these special parameters do not have to be included in the method’s effect

declaration.

When computing the effect of invoking a method with ref or out parameters, the

effect of evaluating the method’s actual call parameters needs to be modified. The

expressions passed by reference may be both read and written by the method. Rather

than perform a complex analysis to determine if the parameter passed by reference is

read, written, or both, my effect system conservatively assumes that it is both read and

written. That means that any expression supplied as an actual ref or out parameter

may be used as both an l-value and an r-value. To compute the side-effects of such an

actual parameter, the effect of reading the expression and assigning to the expression

need to be computed and the union of these two sets of effects is the total effect of

evaluating that parameter. This accounts for the allowed effects involving these special

parameters without having to add additional effects to the method signature since the

parameters themselves document the side-effects.

4.3.3.2 Partial Types

C], unlike Java, allows a class definition to be split into multiple parts in different

files. When generic type parameters are present on a partial type, all of the partial

type’s declarations must have the same number of generic type parameters in the same

order with the same names [78]. To maintain consistency, when a partial type is

parameterized by context parameters, all of the partial implementations of the type

must have the same context parameters with the same names in the same order. This

is necessary to ensure that the different parts are all correctly associated with one

another during compilation.

4.3.3.3 Interfaces

Interfaces are collections of method signatures, without implementations, which a type

may promise to implement. Interfaces are not instantiated independently and do not

have their own representation. Interfaces do not, therefore, have their own this context;


they only have context parameters which may be used to help specify the read and write

effects of the methods on the interface. As a result, the special meaning of the first

formal context parameter being the owner is dropped for interfaces because they do

not have any representation of their own and they do not form part of any object’s

representation. An object which implements an interface has an owner and a this

context like any other object in Zal.

The methods in an interface can have many different implementations, each type which

implements the interface supplies its own method implementations. To ensure consis-

tent effects between all of the implementations, it is necessary to include effect decla-

rations on all of the method signatures listed in an interface and ensure these effects

are consistent with the method implementations when supplied.

4.3.3.4 Arrays

In C], an array is special object that can be used to hold a sequence of either references

to reference types or copies of value types. Because array memory is heap allocated and

arrays are passed by reference in C] they, like any other object, need to have an owner

which denotes the region of memory in which the array is located. It is important to

note that the owner of the array is distinct from the owner of the elements of the array,

if the elements are reference types; the array may be located in a different context from

the objects it references.

C] also provides support for jagged and multi-dimensional arrays. Jagged arrays are

simply arrays of arrays and so each dimension of the jagged array needs to have an

owning context supplied. Multi-dimensional arrays are single objects which have two or

more dimensions required to access an array element. These multi-dimensional arrays

have a single owner since they represent a single object in memory.

Listing 4.30 shows examples of the different types of array discussed and shows those

same examples annotated with context parameters. Note that the context parameters

for the element’s type appear immediate after the type’s name while the array’s owner

is situated directly to the right of the []s.


// single dimension arrays of value types

int[] values = new int[5];int[]|owner| values = new int[5]|owner|;

// single dimension arrays of reference types

Object[] values = new Object[5];Object|objOwner|[]|aryOwner| = new Object|objOwner|[5]|aryOwner|;

// jagged arrays of reference types

Object[][][] values = new Object[5][][];Object|objOwner|[]|dim1Owner|[]|dim2Owner|[]|dim3Owner| values =

new Object|objOwner|[5]|dim1Owner|[]|dim2Owner|[]|dim3Owner|;

// multi-dimensional arrays

Object[,,] values = new Object[5,5,5];Object|objOwner|[,,]|aryOwner| = new Object|objOwner|[5,5,5]|aryOwner|;

Listing 4.30: Examples of single, jagged, and multi-dimensional arrays and their anno-tation with context parameters.

4.3.3.5 User Defined Value Types

In Java, all user-defined types are reference types: the object is allocated on the heap

and a reference is used to access the object. This means that only the reference to the

object is copied on assignment, not the object itself. As was discussed in Section 3.3.2,

I have chosen to handle stack and heap effects orthogonally since the aliasing of stack

locations is strictly controlled, unlike heap locations. This orthogonal handling was

chosen to simplify the ownership and effect systems. The handling of user defined

value types I present in this section is a direct result of this orthogonal handling.

C] provides user-defined reference types in the form of classes as Java does. Unlike Java,

C] also allows user-defined value types in the form of structs. When user-defined value

types are assigned, an entire copy of the struct instance is made. An example of this

copy on assignment behavior is shown in Listing 4.31. The restricted stack model of the

C] programming language means that references to stack allocated objects are tightly

controlled. They can be referenced only when passed as a ref or out parameter to a

method (see Section 4.3.3.1) or when captured as an outer variable by an anonymous

method or lambda expression (see Section 4.3.2.5).

The aliasing of user-defined value types is heavily restricted due to their copy-on-

assignment semantics. This means that a struct is either stored on the stack or on the


public struct Employee {public string name;

}...Employee employee1 = new Employee(); // writes employee1

employee1.name = "Bob"; // writes employee1

Employee employee2 = employee1; // reads employee1 and writes employee2

employee2.name = "Fred"; // writes employee2; employee1.name is still "Bob"

Listing 4.31: A code fragment showing the copy on assignment behavior of a user-defined value type.

heap as part of some object’s representation. The key point to note is that a user-

defined value type is not allocated its own chunk of memory. Heap allocated objects

which store structs allocate the memory as part of the object’s implementation. Stack

variables which hold objects reserve a piece of stack memory of sufficient size for the

struct. This means that the user-defined value type does not have its “own” memory.

The net result of this is that a struct does not need to have an owning context. A struct

may have fields which hold references to non-value types which means that value-types

may still need context parameters. A struct may, therefore, be parameterized by any

number of context parameters to facilitate the construction of types within the struct,

but the special owner meaning attached to the first context parameter in classes does

not apply to structs.

Consider the code in Listing 4.32 which shows an example of a struct representing a

point in a two dimensional coordinate system. Note that the struct holds a reference to

a heap allocated object. Following a discussion of side-effects in structs, this example

will be annotated with context parameters and effect declarations in Listing 4.33.

The declared side-effects of methods and other similar executable code blocks can read

and write the fields of the user-defined value types. The reads and writes of these

fields can be described as reads and writes of the struct’s representation using the this

context. The difference between the this effects on the methods of structs compared to

those on classes is in their abstraction when the this context cannot be named directly.

When the this context is abstracted, it is abstracted to a container. In the case of a

class, the this is abstracted to another context parameter representing a heap memory

region containing the object’s representation. In the case of a struct, the container may

be a heap memory region in the form of a context (for a struct copy stored inside a heap


public class CoordinateSystem {...

}

public struct Point {public int x;public int y;public CoordinateSystem system;

public Point(int x, int y, CoordinateSystem system) {this.x = x; this.y = y; this.system = system;

}public double GetDistance(Point p) {

return Math.Sqrt(Math.Pow((x - p.x), 2), Math.Pow((y - y.x), 2));}

}

Listing 4.32: An example of a struct in the form of a two dimensional coordinate in acoordinate system. Note the struct holding a reference to the CoordinateSystem class.

allocated object) or a local variable name for a struct stored on the stack. This means

that the this context may be abstracted to a context or a local variable depending on

where the struct is being stored.

It is important to note that the this context cannot be used as a context parameter

when naming a type inside a struct since it does not correspond to a region of the

program heap; it is used only in effect sets to describe reads and write of the struct

itself. Listing 4.33 shows the Point struct shown in Listing 4.32 annotated with con-

text parameters and effect sets. Note that the struct has a context parameter which

represents the owner of CoordinateSystem held by the struct. Also note the effects of

the GetDistance method which reads this, the current struct.

4.3.3.6 Static Classes

C] allows types to be declared with a static modifier. The static modifier means that

the type cannot be instantiated. Non-static classes are usually parameterized by a list

of formal context parameters. The first parameter is, by convention, used to hold the

owner of an instance of the type, ie the owner of an object. Because a static class cannot

be instantiated, it does not need a context parameter for its owner. The static class

may still have context parameters, as do user defined value types (see Section 4.3.3.5),

to allow types to be constructed within the static class. There is no actual change to


public class CoordinateSystem[owner] {...

}

public struct Point[sysOwner] {public int x;public int y;public CoordinateSystem|sysOwner| system;

public Point(int x, int y, CoordinateSystem|sysOwner| system)reads <> writes <> {this.x = x; this.y = y; this.system = system;

}public double GetDistance(Point p) reads <this> writes <> {

return Math.Sqrt(Math.Pow((x - p.x), 2), Math.Pow((y - y.x), 2));}

}

Listing 4.33: The Point value type shown in Listing 4.32 annotated with contextparameters and effect declarations.

the syntax of the context parameter list, but the special meaning of the first parameter

does not apply to classes declared to be static. It is important to note that because the

type cannot be instantiated there is no this accessible within the type and there are no

sub-context declarations.

4.3.3.7 Nullable Types

Nullable types add null to the range of values that can be represented by a value type.

This means that a value type can have a specific value to represent unassigned. For

example, consider the bool data type: an unassigned bool variable defaults to a value

of false. This initial value of false cannot be distinguished from a value of false

assigned to the variable. A nullable bool (written bool?) can be initialized to null.

The null indicates that a value has not been assigned. The nullable types syntax used

in C] is syntactic sugar which hides the fact that the nullable types are implemented

by the compiler using the System.Nullable<T> struct. Because they are implemented

with structs, and so are value types, no context parameters are required and so there

is no modification of their syntax.


4.3.3.8 Existing Types

When writing programs in Zal, it is necessary to make use of C] class libraries, such as

the Base Class Library. These libraries contain a large number of classes including a

number of data types, interfaces, and collection data structures regularly used in pro-

grams. Third party code may not always be annotated with owner or effect information.

Because it is highly desirable to continue using these classes despite the lack of declared

effect information, I have included a syntactic construct in my implementation of the

Zal language. This is similar to the wrapper classes introduced in Java 1.5 to facilitate

the use of existing classes [114]. This allows the programmer to provide ownership and

effect information for existing definitions through a syntactic construct, I have called

an effectTemplate. These programmer provided effects are currently treated by the

compiler as assertions and they are not validated. It is possible that this ownership

and effect information could be validated by analyzing the byte-code, but currently the

compiler treats effectTemplate declarations as programmer assertions and does not

validate them.

An effectTemplate must be tied to an existing type definition such as a class, struct,

or interface. The type tied to the template is referred to as the annotated type. The

annotated type is declared on the effectTemplate definition using C]-style inheritance

syntax as shown in Figure 4.34. In the figure, an effectTemplate is being declared

to add a owner context parameter to the existing IList interface in the Base Class

Library.

public effectTemplate IList<T>[owner] : IList<T> { ... }

Listing 4.34: An example of the syntax for declaring an effectTemplate to injectcontext and effect information onto an existing type.

An effectTemplate does not have to declare formal context parameters; if context

parameters are not being added, the declaration syntax is slightly different as shown in

Figure 4.35. The annotated type is the type named after the effectTemplate keyword.

An effectTemplate serves two purposes: (1) it allows types constructed within the

annotated type to be parameterized with context parameters and (2) it allows effect


public effectTemplate IList<T> { ... }

Listing 4.35: An example of the syntax for declaring an effectTemplate which does notadd formal context parameters to a type, but still injects context and effect informationonto member declarations it contains.

declarations to be added to the constructors, methods, properties, indexers, and dele-

gates declared in the annotated type. The effectTemplate contains no implementation

details; no bodies can be supplied on subroutine declarations and no initializers may be

supplied on fields. Further, all definitions in the template must have equivalent defini-

tions without ownership information in the annotated type. A template does not need

to declare effect and ownership information for all definitions in a type; the program-

mer may annotate only a subset of them. However, when declaring effect information

for two types S and T , where S is a subtype of T (S <: T ), it is necessary to ensure

the effects declared in S’s effectTemplate are a subset of any inherited definitions in

T ’s effectTemplate. This is the same restriction required of normal Zal classes and

ensures that the declared effects of a method are the maximum effects possible invoking

the method or any overriding implementation. At present there is limited support for

this validation in the Zal compiler, but full support could be easily implemented using

existing compiler infrastructure.

As previously discussed, the compiler treats the information contained in

effectTemplates as programmer supplied assertions which must be true. Further,

an effectTemplate which adds context parameters to a type can coexist with the

original type definition and both may be used in a program. It may be possible to

build a verifier or inference system which processes Common Intermediate Language

(CIL) byte-code to automatically create effectTemplates for existing types, but such

work is beyond the scope of this thesis.

Listing 4.36 shows an effectTemplate which annotates some methods of C]’s

ICollection<T> interface.


effectTemplate ICollection<T>[owner, data] : ICollection<T>where T : Object|data| {void Add(T item) reads <this> writes <this>;void Clear() reads <this> writes <this>;bool Contains(T item) reads <this, data> writes <>;bool Remove(T item) reads <this, data> writes <this>;void CopyTo[arrayOwner](T[]|arrayOwner| array, int arrayIndex)

reads <this, arrayOwner> writes <arrayOwner>;T Item {

get reads <this> writes <>set reads <this> writes <this>

}}

Listing 4.36: An example of an effectTemplate which adds effect declarations to someof the methods of the ICollection<T>.

4.4 Statics

One of the major semantic and syntactic features present in both Java and C] not

discussed in most ownership type systems is statics. Static fields and methods belong to

the type they are declared on and not an instance of the type. The traditional system

of contexts discussed in Chapter 3 does not consider these static language features.

Static methods and fields were considered in one chapter of Potanin’s thesis [95]. I

have taken a different approach to handling statics for consistency with existing C]

semantics. This section discusses how to account for these language features in Zal.

4.4.1 Static Fields

The class a static field belongs to can be statically determined. This static storage

is separate from the representation hierarchy formed by heap allocated objects and

separate from the stack. Each class’s static storage is separate from the static storage

of every other class; there is no hierarchical structure as there is for heap allocated

objects.

The question remains of how to represent reads and writes of static fields in the effect

system. I propose that the static storage of each class be abstracted as a special static

context. Each type in the system has a context, with the same name as the type,

in which its static representation is stored. I call these contexts type contexts. All

type contexts are disjoint from one another and disjoint from all ordinary contexts.

Type contexts are dominated by the special top context, world. An example of a type


context as an effect is shown in Listing 4.37, where reading the static field value in the

DataStore results in a DataStore type context read.

public class DataStore {public static int value = 1;public int getValue() reads <DataStore> writes <> {

return DataStore.value;}

}

Listing 4.37: An example of a read of a static field of the DataStore class causing aread of the DataStore type context.

The next thing to consider is what happens to static fields when classes are extended.

In C] classes can inherit fields, including static fields, through subtyping. When static

fields are inherited, the child class and the parent class share the same instances of the

inherited static variables. This means that reads and writes of inherited static fields

must be treated as reads and writes of the type context in which they were originally

declared. An example of this is shown in Listing 4.38.

public class Parent {public static int value = 1;

}public class Child : Parent {

public int getValue() reads <Parent> writes <> {return Child.value;

}}

Listing 4.38: The read of Child.value actually reads the value field declared on theParent class and so results in a read of the Parent type context.

Static fields declared on types parameterized by generic type parameters are unique

to the specific instance of the generic type named. For example, a generic class

Test<T> with a static field value, Test<int> would have a different value field from

Test<long>. This behavior is consistent with the use of type names to name type

contexts.

When a static field declared on a type parameterized by generic type parameters is

read or written, the static context named in the appropriate read and write effects

consists of the type name and the actual context parameters used to name the type.

For example, a read of the value field on type Test<int> would cause a read of the


Test<int> static context. Like ordinary static contexts, the static contexts named for

a generic type are heap effects. The specific static contexts constructed from a generic

type are all disjoint provided the type parameters supplied on the type are not the

same. For example, the context Test<int> is disjoint from the context Test<long>.

Lastly, the behavior of static fields of types with context parameters needs to be con-

sidered. In the case of generic type parameters, a different static field is associated with

each constructed type. Further, generic type parameters can be used in the type of a

field as shown in Listing 4.39.

public class Example<T> {public static T value;

}...Example<int>.value // value is an int

Example<string>.value // value is a string

Listing 4.39: An example of a generic type parameter being used in the declaration ofa static.

When parameterizing a type with context parameters, it may be necessary to use some

of the context parameters to construct the types of static fields just as generic type

parameters may be used to construct the types of static fields. I have chosen to maintain

a behavior consistent with that of generics, i.e., a different static field is associated with

each type constructed with different actual context parameters. This means that for

a type Test[owner] with a static field value, constructed types Test|actual1| and

Test|actual2| would be associated with different value fields. Listing 4.40 shows an

example of a class with a static field whose type is constructed using the class’s formal

context parameter.

public class Example[owner] {public static Object|owner| value;}...Example|actual1|.value // value is owned by actual1

Example|actual2|.value // value is owned by actual2

Listing 4.40: An example of a static field whose type is constructed using class contextparameters.


4.4.2 Methods

Static methods are the same as instance methods except that the static methods do not

have access to their type’s instance fields and methods. The effect declarations on these

static methods are the same as the effect declarations on instance methods with the

exception that the effect sets cannot contain the this context or declared sub-contexts

since the method is not associated with an instance of the type.

4.5 LINQ

One of the major additions made in version 3.0 of the C] programming language was

the LINQ query sublanguage [78]. LINQ provides a syntax reminiscent of SQL which

allows programmers to manipulate collections and other data sources in a declarative

manner. When a LINQ query is applied to a collection it can enumerate and process

the elements of a collection in much the same way as the foreach loop. An example

of a simple LINQ query is shown in Listing 4.41.

public class Item {public string name;public static List<Item> items;public static void Main() {

IEnumerable<Item> result =from item in itemswhere item.Name.StartsWith("B")select item

}}

Listing 4.41: An example of a simple LINQ query

Under the hood, LINQ expressions are syntactic sugar which are reduced to a series

of chained method invocations on the expression’s data source. The LINQ language

defines a number of methods, called standard query operators, which are used to im-

plement LINQ queries. The LINQ library supplied with C] version 3.0 supplies several

different implementations of the standard query operators including a set for collec-

tions included in the .NET Base Class Library. The LINQ query shown in Listing 4.41

would be implemented using the methods supplied in the LINQ library. The reduction

of LINQ expressions into chained method invocations is achieved through the repeated


application of a set of ordered rules supplied in the language specification [78]. Com-

puting the side-effects of a LINQ query or parts thereof is, therefore, no different than

computing the effects of any other series of chained method invocations provided that

the side-effects of the standard query operators are known. In my implementation of

Zal, discussed in Chapter 6, I have captured the standard query operator side-effects

to facilitate the analysis of LINQ expressions. Listing 4.42 shows the result of reducing

the LINQ expression shown in Listing 4.41 to a series of chained method invocations.

public class Item {public string name;public static List<Item> items;public static void Main() {

IEnumerable<Item> result =names.Where(n => n.Name.StartsWith("B")).Select(n => n);

}}

Listing 4.42: The reduction of the simple LINQ query example shown in Listing 4.41to a series of chained method invocations.

Declarative operations on collections are amenable to parallelization as was demon-

strated with foreach loops in Section 2.4.1.1. As part of version 4 of the .NET Frame-

work, Microsoft has added a Parallel LINQ (PLINQ) library which can be used to

parallelize LINQ queries [79]. As was the case with foreach loops, not all LINQ

queries can be safely parallelized. Microsoft has not published a complete set of suf-

ficient conditions for the safe parallelization of LINQ queries, but they have provided

the following list of potential pitfalls with PLINQ [81]:

1. Do not assume that parallel is always faster

2. Avoid writing to shared memory locations

3. Avoid over-parallelization

4. Avoid calls to non-thread-safe methods

5. Limit calls to thread-safe methods

6. Avoid unnecessary ordering operations

7. Prefer ForAll to ForEach when it is possible


8. Be aware of thread affinity issues

9. Do not assume that iterations of ForEach, For and ForAll always execute in

parallel

Employing PLINQ on queries which have any of these pitfalls could result in a program

which may not produce a correct or consistent result.

Detecting if a LINQ expression writes to a shared memory location is easily done using

the effect system I have proposed once the LINQ methods have been annotated with

effect information. The annotation is necessary to ensure that any side-effects of specific

LINQ operations are accounted in the effect system. My system, could, therefore, be

used to help facilitate the exploitation of inherent parallelism in LINQ queries, just as

in foreach loops.

Chapter 5

Formalization

In the preceding chapters I have developed a system for reasoning about parallelism

in modern, imperative object-oriented languages. My proposals have been supported

by reasoned arguments, but I have not formalized and proved these ideas. The goal of

this chapter is to formalize my proposals and sketch proofs to show that my proposed

sufficient conditions for parallelism allow parallelization to take place only when doing

so would not violate any dependencies.

Proving the sufficient conditions for the safe application for parallelism using these type

and effect systems I have proposed requires a number of supporting proofs. The overall

structure of the argument presented in this chapter is shown in Figure 5.1. The arrows

in Figure 5.1 indicate how the proofs compose; the proof at the base of the arrow is

used as part of the proof at the head of the arrow.

Before the proofs of the correctness of the sufficient conditions for parallelism proposed

in Chapter 3 can be sketched a number of language properties need to be established:

1. Subject Reduction — types are preserved by reduction

2. Progress — a well typed program is a value or may reduce

3. Soundness — an expression can only generate a runtime type error if it makes

use of an unsafe down-cast

4. Well Formed Heap — types are preserved on the heap

111

112 Chapter 5. Formalization

Figure 5.1: The structure of the proof of sufficient conditions for parallelism correctnesspresented in Chapter 5. The proofs of the items highlighted in red are cited in theliterature rather than re-derived.

5. Owner Invariance — the owner of an object cannot be changed by casting,

assignment, or expression reduction

6. Cast Safety — if an expression makes use of up-casts, no invalid casts will be

encountered during reduction

7. Context Structure is a Tree — the contexts in a program form a tree rooted

at the world context.

8. Context Parameters do not survive — during reduction, all context param-

eters are replaced with locations

Chapter 5. Formalization 113

Having established that the above properties hold, the next step is to demonstrate that

the following properties hold for the effect system:

1. Effect Soundness — effects are preserved by all effect rules; no effect can be

captured and then lost

2. Effect Completeness — all reads and writes of the stack and heap are captured

as part of the computed effects

3. Disjointness Test Correct — demonstrates the basic pointer chasing algo-

rithm, used by the Zal runtime system for testing context disjointness, is correct

4. Context Disjointness Implies Effect Disjointness — stating that two con-

texts are disjoint implies that the context and all of the contexts below it in the

ownership tree are disjoint

5. Disjoint effects imply no data dependencies — if there are no overlaps

between the write sets of two expressions or the read set of one and the write set

of the other, then there are no data dependencies between the two expressions

Each of the key features at the core of the type and effect system I have proposed can

be found in different ownership systems in the literature. Rather than re-deriving the

proofs for these different core language features and the key properties of Ownership

Types, I cite existing work to argue for the language properties I rely on. The proofs for

the correct and safe operation of the advanced language features described in Chapter 4

can be directly derived from these core language proofs using the syntactic approach

for proving type soundness proposed by Wright and Felleisen [126].

I then use the properties proven for the core language to sketch how to prove that if

data dependencies (flow, output, and anti dependencies) are preserved by a parallelizing

transformation, then the result of the transformation will be sequentially consistent.

By sequentially consistent, I mean that executing the parallel program will produce

the same result and side-effects as executing the original sequential program. Finally,

I sketch proofs for the correctness of the sufficient conditions proposed for the safe use

of task, data, and pipeline parallelism as discussed in Chapter 3.


Readers uninterested in the details of these proofs and this more formal presentation

of the system can skip the remainder of this chapter. Readers who are interested in the

key details of these proofs should read on; the presentation generally follows the order

of discussion just presented with a few minor exceptions.

5.1 Type System

Before a program’s type and effect information can be relied on for the purposes of

parallelization, it is first necessary to argue that the language’s type and effect systems

generate complete and accurate information. In this section I cite previous work to

prove different features and facets of my proposed type and effect system. The goal is

to cite previous work to lend credence to the argument that my proposal is valid from a

type theoretic perspective. I will begin by citing existing Ownership Types literature to

argue for the safety of my proposed type system and I will follow this with an argument

for the safety and completeness of the effect system I have proposed.

5.1.1 Type Rules

When arguing for the correctness and safety of the type rules employed by my proposed

system, there are two main areas of concern: the core language (C]) and the ownership

extensions applied to the core language.

Fruja has published a proof of the type safety of the C] type system [42]. This proof

covers the core features of the C] language. Furja’s proof does not directly cover

C] generics. Jula has formalized the semantics of C] generics [65] and combined with

the proofs of generic types proposed for Java, including the Featherweight Generic Java

(FGJ) [60] suggests that the C] system should also be type safe. The proofs contained in

these works support the argument that the C] language suggest the language properties

listed previously hold for the core language I chose to extend.

Ownership Types has been an area of active type systems research for a number of

years [28, 41, 95, 38, 24, 25]. Java has traditionally been used as a basis for ownership

research languages. While the C] language is different from Java in a number of ways,

the core operation of the two languages is largely similar as can be seen from the


similarities in the formulations of FJ’s [60] and Fruja’s proofs [42]. While some work

would have to be done to adapt the language proofs cited below for use in C], the

language similarities suggests that this should not be an overly onerous task.

At its core, the hierarchical single-owner system I have adopted in my proposal was

first proposed by Clarke, Potter, and Noble [28]. The type system supporting this

original formulation was quite strict [41]. Subsequent work by a number of authors has

helped to relax some of these restrictions. My proposal is most closely related the JoE

research language and its subsequent derivatives [27, 24]. JoE employs a hierarchical

single owner system [27] with a notation quite similar to my proposed system. The key

properties I rely on, namely ownerships being immutable once assigned and ownership

contexts forming a tree hierarchy, are characteristic of all these systems.

Ownership Generic Java (OGJ) combines Java generics with ownership annotations in

a concise manner [95]. The final formulation presented in Potanin’s thesis includes full

language proofs which support the language properties I require as discussed in the

introduction. My use of C] generics with ownerships is similar to OGJ in a number of

ways. Like OGJ, Zal allows parent types and implemented interfaces on type declara-

tions to be annotated with ownership information. The handling of static fields in OGJ

is also similar to that in Zal. Unlike OGJ, Zal requires an explicit list of context pa-

rameters on type declarations in addition to the ownership annotations on super-types

and is purely syntactic in nature. C] does allow generic types to be parameterized by

primitive types, quite unlike Java. Since I elected to treat value and reference types or-

thogonally, extending Potanin’s proofs would be relatively straight forward and would

be a mechanical exercise.

Allowing ownerships to be resolved at a sub-object level of granularity has been pro-

posed in a number of Ownership Types systems including JoE [27] and Ownership

Domains [3]. The use of explicitly declared subcontexts is most similar to the explicit

declaration of domains in Ownership Domains [3] which is another well proven system.

The use of constraints on ownership context parameters on types and methods has

previously been proposed in the literature. The domination operator (<) was first

formulated in JoE [27]. The disjointness operator (#) was extensively discussed in the

multiple ownership MOJO system [25]. Both JoE and MOJO employ a hierarchical


ownership system and so these two different formulations could be easily combined.

Overall, none of the ownership specific language features I have proposed as part of

Zal is novel from an Ownership Types perspective. What is new and different is the

combination of language features and their combined use to reason about data depen-

dencies and inherent parallelism. Because the language features themselves are all well

proven in isolation, combining these proofs to prove the language properties I require is

a largely mechanical, if tedious and time consuming, process. Proving the correctness

of the annotation systems I have created could then be done using Wright and Felle-

sien’s syntactic approach [126] since these extended language features are, for the most

part, syntactic sugar.

5.1.2 Effect Rules

I am not the first to propose capturing side-effects of evaluation in terms of ownership

contexts. Previously systems such as JoE [27], Effective Ownership Types [73], and

Boyapati Lee and Rinard’s system for preventing data races and deadlocks [20] have

all employed some form of ownership effect annotation. These systems have all shown

how to construct an effect system that is both sound and complete. The style of effect

system that I have proposed is quite similar to that found in JoE and Boyapati et

al’s work. The major difference in my formulation is that I employ disjoint read and

write effect summaries without having a context read implied by a context write. This

does not materially change the effect system itself and so the proofs employed in these

previous systems could be easily adapted to work on Zal to prove the properties I require

hold. Specifically, proving that all read and write effects are captured when generated

and are not subsequently lost during effect computation of compound expressions is a

largely mechanical process.

5.2 Proof of Ancestor Tree Search Algorithm

Given that the type system I have proposed ensures that object owners are fixed once

allocated and that the ownership contexts form a tree, I now proceed to prove the

correctness of the runtime context disjointness test (#). In this section, I formally define


context disjointness, prove that context disjointness implies effect disjointness and,

finally, prove the correctness of the simple parent pointer chasing algorithm described

in Section 3.4.4 for runtime disjointness testing. The other algorithms I proposed, using

Dijkstra views and Skip Lists, are optimizations of this basic algorithm.

To help with the creation of a formal definition of the disjointness test, I first define a

helper function, ancestors, as shown in Figure 5.2.

Ancestors Contexts:

ancestors(

Γ, l1)

= l1, ancestors(

Γ,Γ(l1))

ancestors(lw

)= ∅

Figure 5.2: The helper function, ancestors used to test for context disjointness. Notethat Γ represents the type checking environment and Γ(l1) obtains the parent of contextl1.

Definition 5.1 (Context Disjointness).

l0#l′0 iff l0 /∈ ancestors(l′0

)and l′0 /∈ ancestors

(l0

).

Theorem 5.2 (Context Disjointness Implies Effect Disjointness).

If two contexts l0#l′0, then when l0 and l′0 are named as effects, the areas of memory

represented by the contexts when named as effects are disjoint.

Proof. Given that ownership contexts form a tree hierarchy [28]. If l0#l′0, then they lie

on different branches of the tree. By the structure of a tree we know that the subtrees

rooted at the context nodes, which is the scope of the effect, cannot overlap since each

object has exactly one owner. Therefore, to show that two contexts named as effects are

disjoint, it is sufficient to show that the two contexts are disjoint from one another.

Theorem 5.3 (Static Context Relations).

The world context is a dominator of all other contexts. An object of type C∣∣ k ∣∣ has

owner k1 and k1 dominates the this context.

Proof. To prove the universal domination of world, we use the fact that the world

context is an ancestor of all contexts, and so, by the definition of context disjointness,


any disjointness test against world will always fail. Similarly, we know that the this

context is always a child of its owner k1, and so they are never disjoint.

Having formally defined disjointness and proven the equivalence of context and effect

disjointness, I now prove the correctness of the algorithm. To do this, it is necessary

to demonstrate that, if b is an ancestor of a, then following the parent pointers from a

will find context b. Since ownership contexts in my proposed system form a tree and

ownership information cannot change allows the proof to be done by induction.

Figure 5.3: The base case for the ancestoralgorithm inductive proof: a and b are thesame node.

Figure 5.4: The induction step of the an-cestor algorithm proof: b is a parent ofthe nth parent of a.

The base case is shown in Figure 5.3, where a and b are the same context. No pointers

need to be followed and b has been found in 0 steps from a. Now assume, as the

induction hypothesis, that for an ancestor b, n levels removed from a, that b can be

found by following parent pointers in n steps. It now remains to prove that, for an

ancestor b, n + 1 steps removed from a, that b can be reached by following parent

pointers from a in n+1 steps, as illustrated in Figure 5.4. By the induction hypothesis,

we know that we can get from a to the nth ancestor of a in n steps by following the

parent pointers. Following the parent pointer from the nth ancestor of a arrives at the

n+ 1th ancestor of a, b in n+ 1 steps, concluding the proof of the traversal algorithm.

By performing the walk from both of the contexts being tested for disjointness, both

the case of a being an ancestor of b and b being an ancestor of a are handled. This

means that if a parent-child relationship exists between the contexts, this relationship

will be found. If such a relationship does not exist, then the two contexts must be

disjoint.


This algorithm will always terminate as any program has a finite number of objects

and there are no cycles in the ownership hierarchy.

5.3 Proof of Condition Correctness

Having presented an argument to lend credence to the type system underpinning my

reasoning being sound, the effect system sound and complete, and the runtime disjoint-

ness test proposed in Section 3.4.4, I conclude by proving that the sufficient conditions

for parallelism I previously stated in Chapter 3 are, indeed, sufficient for their purposes.

Whenever part of a computer system, be it hardware or software, allows program op-

erations to be reordered, at runtime, relative to the order specified by the programmer,

there should be some kind of consistency model guaranteed by the transformational

process. The consistency model provides a specification of what changes can be made

to the execution order of the program and, consequently, what invariants will be pre-

served by the transformation.

These consistency models and the preservation of program meaning are important for

enforcing determinacy. Denning and Dennis provide an eloquent definition for determi-

nacy, “Determinacy requires that a network of parallel tasks in shared memory always

produces the same output for given input regardless of the speeds of the tasks” [36].

They also eloquently describe why it is important, “It tells us we can unleash the full

parallelism of a computational method without worrying whether any timing errors or

race conditions will negatively affect the results” [36].

When automatically applying a parallelizing code transformation to a sequential pro-

gram, it is generally desirable to ensure that the transformation satisfies Bernstein’s

Conditions for parallelism. Bernstein’s Conditions, as originally defined, state that two

sub-routines S1 and S2 can be safely executed in parallel provided that:

• IN(S1) ∩OUT (S2) = ∅

• OUT (S1) ∩ IN(S2) = ∅

• OUT (S1) ∩OUT (S2) = ∅


where IN(S) is the set of memory locations used by S and OUT (S) is the set of memory

locations written to by S [15]. All parallelizing transformations which guarantee that

these conditions are satisfied will produce the same results as the original sequential

program. There are specific cases where less strict conditions may suffice, but for

program correctness, in the general sense, to be preserved, these conditions must be

satisfied.

The Bernstein’s Conditions ensure that there are no dependencies between the code

blocks to be run in parallel. There are two categories of dependencies which may exist

in a program: control dependencies and data dependencies. If these dependencies are

preserved when the program is parallelized then the determinacy of the program is

preserved.

In all of the sufficient conditions I have proposed, I have stated that no control depen-

dencies which would prohibit parallelization exist in the code being analyzed. Existing

systems can be used to reason about and prove the absence of such control dependencies

and so I do not address control dependencies in this section; they are declared not to ex-

ist. This section focuses on proving that when the sufficient conditions I have proposed

are satisfied, the appropriate parallelizing transformation can be safely applied.

The sufficient conditions I have proposed focus on ensuring no data dependencies are vi-

olated when parallelizing code, and I prove the correctness of these sufficient conditions

in this section.

Before beginning the detailed proofs of correctness, it is important to define what a

data dependence is. I quote this definition from Goff, Kennedy, and Tseng [46]:

We say that a data dependence exists between two statements S1 and S2 if

there is a path from S1 to S2 and both statements access the same location

in memory. There are four types of data dependence:

• True (flow) dependence occurs when S1 writes a memory location

that S2 later reads.

• Anti dependence occurs when S1 reads a memory location that S2

later writes.


• Output dependence occurs when S1 writes a memory location that

S2 later writes.

• Input dependence occurs when S1 reads a memory location that S2

later reads.

Of these four types of data dependency, only three are usually considered when data

dependencies are discussed in relation to reordering code transformations: flow, output,

and anti dependencies. Input dependencies do not restrict reordering of operations and

so are, by common convention, ignored.

The first step of building the proofs of the sufficient conditions for parallelism is to link

the effects computed by my effect system to the data dependencies. The first step is

to define what it means for two sets of effects to be disjoint

Definition 5.4 (Disjoint Effects).

If e and e′ are well-typed expressions and ` e : ϕ and ` e′ : ϕ′, then effects ϕ are said to

be disjoint from effects ϕ′ (written as ϕ#ϕ′) where ϕ = 〈r, w|x, y〉, ϕ′ = 〈r′, w′|x′, y′〉

when r#w′, w#r′, w#w′, x#y′, y#x′, and y#y′.

Having defined effect disjointness, I now prove, in Theorem 5.5, that when two expres-

sions have disjoint effects then no data dependencies can exist between them .

Theorem 5.5 (Disjoint Effects Imply No Data Dependencies).

If e and e′ are well typed expressions such that e is evaluated prior to e′, and ` e : ϕ

and ` e′ : ϕ′ with ϕ#ϕ′, then there are no flow, output, or anti dependencies between

e and e′.

Proof. Let ϕ = 〈r, w|x, y〉 and ϕ′ = 〈r′, w′|x′, y′〉. The proof of this theorem is now done

by contradiction, considering the three cases of flow, output, and anti dependencies.

Begin assuming, by way of contradiction, that there is an output dependency between

e and e′. If that dependence is via a field f on an object owned by context c, then

by side-effect information being sound and complete we know that c ∈ w and c ∈ w′.

However, we know from ϕ#ϕ′, w#w′, which provides the contradiction. If the output


dependence is via a variable then both must write to a variable x and, again, by side-

effect information being sound and complete, x ∈ y and x ∈ y′. However, we know

from ϕ#ϕ′, y#y′ which provides the contradiction.

Now assume, once again by way of contradiction, that there is a flow dependence

between e and e′. If that dependence is via a field f on an object owned by context c,

then by effect soundness and completeness c ∈ w and c ∈ r′. However we know from

ϕ#ϕ′, w#r′, which provides the contradiction. If the flow dependence is via a variable

x, then by effect soundness and completeness c ∈ y and c ∈ x′. However, we know

from ϕ#ϕ′, y#x′, which provides the contradiction and concludes the case. A mirror

argument may be made to prove that anti dependencies do not exist, which concludes

the proof.

I now prove that it is sufficient to preserve only flow, output, and anti dependencies to

preserve sequential consistency.

Lemma 5.6 (Update Dependency Preservation Sufficient for Parallelization with Se-

quential Consistency).

In the absence of control dependencies which would prohibit parallelization, if two code

blocks S1 and S2 have no flow, output, or anti dependencies between them, then they

can be safely executed in parallel.

Proof. For S1 and S2 to execute safely in parallel in my proposed system, the version

executed in parallel must be sequentially consistent with the original implementation.

If an output dependence exists between S1 and S2 with each writing a value to a field

f on an object o and they execute in parallel, then the final value of f depends on

whether S1 or S2 writes to f first. This race could cause the sequential consistency

model to be violated and so to guarantee sequential consistency we need to ensure no

output dependencies exist. Mirror arguments can be made for the need to prohibit flow

and anti-dependencies.

Assume now, by way of contradiction, that a flow dependence exists between S1 and

S2, with S1 writing a value to a field f on an object owned by context c, later read

by S2, and that they execute in parallel. If S1 writes a value later read by S2 the


value read by S2, depends on the order in which S1 and S2 run. This could violate the

sequential consistency guarantee, so it is sufficient to ensure that no flow dependencies

exist. A mirror argument can be constructed for anti dependencies.

Finally, if an input dependence exists between S1 and S2, the value of the location read

does not change. Hence, the reads can be performed in either order without violating

sequential consistency if there are two reads. Which concludes the proof.

A number of the disjointness operations in the following proofs operate on sets of con-

texts named as effects. I now define context set disjointness to simplify the presentation.

Definition 5.7 (Context Set Disjointness).

If l#l′ then ∀l ∈ l(∀l′ ∈ l′ l#l′

).

The rest of this section argues for the correctness of the sufficient conditions.

5.3.1 Task Parallelism

The first and simplest parallelization pattern I discussed was task parallelism — the

execution of two blocks of code in parallel. Two blocks of code, S1 and S2 can be

safely executed provided no control dependencies, which would prohibit parallelization,

exist between them and no data dependencies, in the form of flow, output, or anti-

dependencies, exist between them. I previously stated that it was sufficient to show:

• Task Condition 1 — no control dependencies exist which would prohibit par-

allelization.

• Task Condition 2 — the contexts read by statement S1 are disjoint from the

contexts written by statement S2

• Task Condition 3 — the contexts written by statement S1 are disjoint from the

contexts written by statement S2

• Task Condition 4 — the contexts written by statement S1 are disjoint from the

contexts read by statement S2

I now formalize these conditions and argue that they are indeed sufficient.


Theorem 5.8 (Task Parallelism Condition Sufficiency).

Given ` S1 : 〈r, w|x, y〉 and ` S2 : 〈r′, w′|x′, y′〉, then if r#w′, r′#w, w#w′,x∩y′, x′∩y,

and y ∩ y′, S1 can be run in parallel with S2 provided there are no control dependencies

which would prohibit parallelization between S1 and S2.

Proof. Assume that c is a context where c ⊆ w and c ⊆ w′. Given that each context

has a single owner, it must be the case that either c ⊆ wi ⊆ w′j or c ⊆ w′

j ⊆ wi for

some valid i and j. Figure 5.5 shows the relationships between c, wi, and w′j . From

the theorem we know wi#w′j , for all valid i and j, which is a contradiction, thus there

cannot be a heap output dependence between S1 and S2. The absence of heap flow

and anti dependencies can be proved with mirror arguments using w with r′and r with

w′, respectively. Stack locations have unique names and so the lack of overlap between

stack locations read and written ensures that there are no stack data dependencies.

Therefore, by Lemma 5.6, having proved the absence of data dependencies and assumed

a lack of control dependencies, the sufficient condition’s correctness is proved.

Figure 5.5: The relationship between contexts c1, c2, and object x.

5.3.2 Data Parallelism

A loop can be safely parallelized, provided no data or control dependencies exist be-

tween iterations. In Section 3.5.2, I stated the following sufficient conditions for the

safe parallelization of a simple foreach loop of the form shown in Listing 5.1.

foreach (C|k| e in collection)e.operation();

Listing 5.1: The simple stereotypical data parallel foreach loop for which sufficientconditions for parallelization were developed.


The sufficient conditions stated were:


parallelization,


to the loop body must be disjoint, and

• Loop Condition 3: the operation’s write set contains only the representation

of the element on which is invoked. It does not read the representation of any

other elements in the collection although it can read data from other disjoint

contexts.

I now formalize these conditions and prove they are sufficient for their stated purpose.

Theorem 5.9 (Data Parallelism Condition Sufficiency).

Consider a foreach loop of the form shown in Listing 5.1. Let e1...en represent the

elements returned by the iterator to be processed by the loop. It is sufficient to show

that:

1. there are no control dependencies which would prohibit parallelization

2. ∀i, j ∈ 1...n, i 6= j ⇒ ei 6= ej

3. T operation() reads <r> writes <w> where ∀w ∈ w w ≤ this ∧ ∀r ∈ r(r ≤

this ∨ r#k1

)for the loop iterations to be safely executed in parallel.

Proof. The proof is done by contradiction and the argument proceeds in a similar way

to that presented for the proof of correctness of the task parallelism sufficient conditions.

Assume, by way of contradiction, that an output dependence exists between iterations.

Let e1 and e2 be two separate elements in e1...en such that e1.operation() writes to

a field of an object in context x and e2.operation() writes to that same field on that

same object in context x.


From hypothesis condition 3, w may contain at most the this. This means that

e1.operation() can write only to e1 or contexts strictly dominated by e1. Similarly,

e2.operation() can write only objects that are either e2 or strictly dominated by e2.

By hypothesis condition 2, we know e1 6= e2. Figure 5.6 shows this set of relationships.

Figure 5.6: The relationships between e1, e2, and x as used in the proof of effectdisjointness.

From this point, the proof proceeds by the same argument made to prove Task Con-

dition 2. If x is contained in both e1 and e2, then it must be the case that, either e1

dominates e2 or e2 dominates e1. However, because e1 and e2 have the same owner,

k1, e1 6= e2. Since object owners cannot be changed and ownership contexts form a

tree structure, we know that e1 6= e2 ⇒ e1#e2. Since e1#e2, there can be no context x

which is part of both e1 and e2 since each context has a single owner, which provides

the contradiction.

Assume now, by way of contradiction, that a flow dependence exists. That e1...en must

contain two distinct elements e1 and e2 such that e1.operation() writes to a field of

some object in context x and e2 reads from the same field of that same object in x.

There are two possible locations for x since a context read, r, may be either dominated

by this or disjoint from k1.

If r is dominated by this then we need to prove that there is no x that is contained in

both e1 and e2, which we have just done. If, on the other hand, r is disjoint from e1,

we must show that there is no x such that x is disjoint from e1 and dominated by e2


Figure 5.7 shows the relationships of e1, e2, r, and x.

Figure 5.7: The relationships of e1, e2, r, and x and the disjointness of k1 and r for theproof of effect disjointness.

If x is dominated by both e1 and r then it must be the case that r is dominated by

e1 or that e1 is dominated by r. However, from hypothesis condition 3, we know that

e1#r by r#k1 and we know that ownership contexts form a hierarchy. Together, this

provides the contradiction. A mirror argument can be made to prove the absence of

anti dependencies. Having, therefore, proved that no flow, output, or anti-dependencies

can exist when the conditions are met, the safety of parallelizing this loop follows from

Lemma 5.6 which concludes the proof.

5.3.3 Pipeline

It now remains to prove that the algorithm presented in Chapter 3 for constructing a

Data Dependency Graph (DDG) correctly identifies all of the inter- and intra-iteration

loop dependencies. The DDG construction algorithm as originally presented in Sec-

tion 3.5.3:


itself):

(a) Add an inter-iteration dependency if a flow, output, or anti-dependency

exists between the two statements via any context which is not dominated


by the element’s this context or any stack variable declared outside the loop

body’s scope.

(b) Otherwise, if the two statements are different, add an intra-iteration depen-

dency if a flow, output, or anti-dependency exists between the two state-

ments via a context dominated by or equal to this context or any stack

variable declared inside the loop body’s scope.

Before proceeding to prove that all the inter- and intra-iteration dependencies are

correctly identified by the DDG construction algorithm, I first present an example of

such a loop to refine the nature of the pipeline being considered. This example is shown

in Listing 5.2. I also present a formalized version of the algorithm. As was the case

with data parallelism, arbitrary pipeline stages can be transformed into the standard

form shown in Listing 5.2. Note that the behavior of the stack can be modelled using

objects and so an explicit stack is omitted from the formalization.

foreach(T|o| e in collection) {e.op 1(...);e.op 2(...);...e.stage n(...);

Listing 5.2: An example of the style of loop intended for pipelining

Theorem 5.10 (Pipeline Condition Sufficiency).

Let e1 . . . em be the set of objects in collection to be processed by the pipeline,

op1 . . . opn be methods on all of the objects e1 . . . em, outer be the set of stack variables

read and written by the loop, inter be the set of inter-iteration dependencies as tuples

of stages and intra be the set of intra-iteration dependencies as tuples of stages. Each

stage has heap read effects rj, heap write effects wj, stack read effects xj and stack

write effects yj.

1. ∀i, j ∈ 1 . . . n(wi ∩ rj

)∪

(ri ∩ wj

)∪

(wi ∩ wj

)*

{this

}∨( (

yi ∩ xj

)∪

(xi ∩ yj

)∪

(yi ∩ yj

) )∩ outer 6= ∅

⇐⇒(stagei, stagej

)∈ inter


2. ∀i, j ∈ 1 . . . n i 6= j∧(∅ 6=

(wi ∩ rj

)∪

(ri ∩ wj

)∪

(wi ∩ wj

)⊆

{this

}∨( (

yi ∩ xj

)∪

(xi ∩ yj

)∪

(yi ∩ yj

) )− outer 6= ∅

)⇐⇒

(stagei, stagej

)∈ intra

Proof. Intra-iteration dependencies are a special case of inter-iteration dependencies.

All dependencies are inter-iteration dependencies unless the dependency exists entirely

on iteration unique state.

Assume that for two operations i and j in the loop, that an output intra-iteration de-

pendency exists between the stages. The proof of the correctness of these classifications

now proceeds by contradiction. Assume by way of contradiction that an inter-iteration

dependency exists between the i and j operations. There are two cases for the depen-

dency to consider:

Case (Stack Dependence): If the inter-iteration dependence is caused through a

stack location, that stack location must be declared outside the scope of the loop. How-

ever from step 2 of the algorithm above, the outer variables are explicitly excluded when

checking for intra-iteration dependencies by( (yi∩xj

)∪

(xi∩yj

)∪

(yi∩yj

) )−outer 6=

∅ in the rule. This provides the contradiction and the inter-iteration dependence cannot

exist through the stack.

Case (Heap Dependence): If the inter-iteration dependence is caused through a heap

location, that heap location must be accessible to multiple loop iterations. However

from step 2 of the algorithm above, intra-iteration dependencies can exist only through

the representation of the data element being processed by(wi∩rj

)∪

(ri∩wj

)∪

(wi∩

wj

)⊆

{this

}portion of the rule. Because all of the elements in the collection share

the same owner and are not equal, then their representations must be disjoint which

provides the contradiction, and the inter-iteration dependence cannot exist through the

heap.

Mirror arguments can be made to show that no inter-iteration flow or anti-dependencies

can exist when the algorithm identifies an intra-iteration dependence. This concludes

the proof.


5.4 Summary

In this chapter I have argued for the correctness of the Zal language based on previous

Ownership Types work. I have also sketched the proofs of the correctness of the suffi-

cient conditions I proposed for exploiting inherent task, data, and pipeline parallelism.

These results provide rigor and precision to support my proposals.

Chapter 6

Implementation

In Chapter 4, I presented the design of the Zal programming language — C] version 3

extended with the Ownership Types based realization of my ideas from Chapter 3. Be-

fore I can use Zal in Chapter 7 to validate my ideas by applying them to representative

sample applications, I needed to implement a compiler for the language. The design

and implementation of my Zal compiler and its associated runtime ownership tracking

system is the focus of this chapter.

The main contribution of this chapter is the detailed design of my compiler and runtime

system. I chose to use the Gardens Point C] (GPC]) [32] research C] compiler as a

basis for my compiler implementation. The resulting Zal compiler is a source-to-source

compiler in the tradition of CFront [112]. This means that the compiler reads in Zal

programs and produces C] source code. I, therefore, had to implement all of Zal’s

semantics in C]. There are numerous small technical contributions throughout this

chapter as I present the design of the compiler and how it implements Zal’s semantics.

Readers not interested in these technical details can safely skip this chapter.

6.1 Background

Once the need to implement a “compiler” had been identified, the first major decision

was choosing whether to extend an existing C] compiler or to write one from scratch.

After studying the problem, it was decided to proceed with writing a new C] compiler

from the ground up. This thesis project is just one of many language research projects

131

132 Chapter 6. Implementation

and there is a common need to be able to experiment with new language features

and type system extensions. Such language research can be targeted at both general

purpose programming as well as more domain specific applications. The goal in writing

a new C] compiler was to create a new piece of research infrastructure which could be

used in these different projects. To facilitate this, it was necessary to modularize the

compiler design so that compilation steps could be combined as desired and extended

easily where necessary. Examples of this modularity include the declarative grammar

used to describe the language and the strict separation of phases as will be discussed

shortly.

We decided to implement a source-to-source compiler rather than producing Common

Intermediate Language (CIL) byte-code directly from the compiler. This simplified the

implementation of the compiler and made it easier to debug the compiler output. The

compiler produces C] code and new language features added in research languages are

implemented in C] during the code generation phase. The modular design means that

a byte-code generator could be written at a future date when it becomes necessary for

the research being undertaken.

The base compiler reads in a C] version 3.0 source file and produces a C] version 3.0

source file after type checking the program. Compiler extensions modify the input

language and define the implementation of new language features in the C] code pro-

duced. The compiler has been tentatively named the GPC] compiler and is available

online [32].

6.2 Implementation Attribution

Before continuing to describe the design and implementation of the compiler and run-

time system I need to acknowledge and thank my supervisor Dr. Wayne Kelly and

Prof. John Gough for all the work they did helping to write the GPC] compiler. Their

assistance was invaluable in getting this project completed on-time. I refactored the

GPC] compiler so that I could then extend it with my proposed ownership type and

effect system. This refactoring included significant changes to the handling of type

parameters and parts of the type checking and resolution processes. The design and

Chapter 6. Implementation 133

implementation of the ownership extensions to the compiler and the associated runtime

libraries are entirely my own, but would not have been possible without the joint effort

to create the GPC] compiler.

Table 6.1 shows the relative sizes of the Ownership Extensions and the unmodified

Gardens Point C] compiler. Note that the core GPC] compiler statistics include code

written to provide the extension points required to implement the ownership exten-

sions. The creation of the extension points was non-trivial. The code required to

implement the extension points was generally easy to write, but required careful design

consideration and so consumed approximately 25% of the total development time of

the extensions.

Metric Zal Compiler Total GPC] Extensions Extensions (% total)SLOC-P 39,444 27,288 12,156 30.8%SLOC-L 22,201 14,957 7,244 32.7%

McCabe VG 6,152 4,248 1,904 30.9%

Table 6.1: Measures of the relative sizes of the Ownership Extensions and the GPC]compiler in terms of physical lines of code (SLOC-P), logical lines of code (SLOC-L),and cyclomatic complexity (McCabe VG) [76].

6.3 Design

As previously stated, the ultimate goal of writing our own C] compiler was to create a

research tool which could be used to experiment with language design including new lan-

guage features, type system extensions, and program analysis techniques implemented

on top of an existing industrial language. To facilitate this, the compiler design was

heavily componentized. The basic C] compiler was created and bootstrapped before I

began to create the compiler extensions needed to compile Zal.

In this section, I discuss the details of these different compilation steps and how I added

extension points to allow support of Zal to be added in a modular manner. Figure 6.1

shows the different operations performed by the compiler and the names of the methods

invoked to perform the operation.


Figure 6.1: A diagram showing the different parts of the compiler we have written. TheZal only operations are shown in white boxes; these steps are skipped by the normalC] compiler.

6.3.1 Scanner & Parser

We decided to write our compiler in C]. There are several reasons for this choice:

existing research group familiarity with the language and associated tools, possibility of

bootstrapping the compiler as an interesting test case, consistency between the language

being compiled and the language of the compiler, and the advanced language features

of C] which we exploited in the compiler design as will be discussed.

There are a number of scanning and parsing tools available today, but few of them

produce C] code. Coco/R is a popular parser generator which comes in versions for

many languages including C]. Coco/R accepts an attributed LL(k) grammar, for an

arbitrary k greater than 1, and produces a scanner and parser for the specified lan-

guage [125]. The Coco/R generated scanner was replaced with one generated using the

Gardens Point Scanner Generator (GPLex) [48]. The C] language specification allows

unicode characters to appear in the source of C] programs. The Coco/R generated


scanner could not handle unicode characters and so was replaced to make the scanner

and parser compliant with the C] language specification [78]. This substitution required

minor modification to the Coco/R file templates, but no modification to Coco/R itself.

When creating the scanner and parser for my ownership type and effect extensions, the

goal was to isolate the scanner and parser for the extensions from that of the GPC]

compiler. Zal uses the same symbols as C] so the GPLex scanner specification did

not need to be modified. A separate grammar definition is used to produce the Zal

parser, but the grammar is mostly the same as the standard C] grammar. The modified

grammar produces the same basic Abstract Syntax Tree (AST) as the GPC] parser,

but Zal ASTs may contain some extra nodes and some nodes have extra fields and

methods added to track additional information. The modular design of the extension

allows the core GPC] compiler to still be compiled from the same code base without

modification.

The Coco/R grammar specification [125] consists of a sequence of production defini-

tions. Each production rule is specified using an EBNF expression as well as semantic

actions to execute as the EBNF expression is matched. The EBNF grammar allows

optional matching by placing portions of the EBNF expression between []s and op-

tional multiple matches using {}s. Semantic actions can be associated with any part

of the EBNF grammar and they are listed between (. .)s. Every production may

have a number of input and output attributes to pass information to the production of

a non-terminal in a production and return values from the production to its caller.

There were three main productions added: formal context parameters on type and

method definitions, actual context parameters on type constructions, and effect decla-

rations on invokable code blocks such as methods, property accessors, and constructors.

These main productions were added to existing language productions to implement the

language features as previously discussed in Chapter 4.

Listing 6.1 shows the grammar production rule which recognizes a list of actual context

parameters. A list of actual context parameters consists of at least one context param-

eter listed between ||s. The listed contexts are stored as a list of context parameters.

Listing 6.2 shows the grammar rule for the list of formal context parameters production.


actualContextList<. out UnresolvedContextActuals cl .> =(. cl = null; List<string> l = new List<string>(); .)(

("|" (. string i; .)contextName<out i> (. l.Add(i); .){

"," (. string i2; .)contextName<out i2> (. l.Add(i2); .)

}"|" (.cl = l.Count > 0 ? new UnresolvedContextActuals(l):null;.))

).

Listing 6.1: COCO/R grammar production for a list of actual context parameters inZal.

Like the rule for the list of actual context parameters shown in Listing 6.1, the list

of formal context parameters consists of at least one context parameter name listed

between []s. The parameters are stored as a list of ownership contexts which can then

be attached to the appropriate declaration.

formalContextList<. out ContextFormals cf .> =(. List<OwnershipContext> cl = new List<OwnershipContext>(); .)"[" (. string i; Token tok0 = la; .)ident<out i> (. cl.Add(new OwnershipContext(tok0,i,false)); .){"," (. string i2; Token tok1 = la;.)ident<out i2> (. cl.Add(new OwnershipContext(tok1,i2,false)); .)

}"]" (. cf = cl.Count > 0 ? new ContextFormals(cl) : null; .)

.

Listing 6.2: Coco/R grammar production for a list of formal context parameters in Zal.

Lastly, Listing 6.3 shows the grammar production for a set of declared read and write

effects. As was the case with context parameters, the declared effect sets are stored as

lists and attached to the appropriate declaration.

Apart from the productions for formal and actual context parameters as well as effect

declarations, the production for the foreach loop was modified to accept the features of

the enhanced foreach loop described in Section 2.4.1.2. Listing 6.4 shows the modified

foreach loop production; notice the option ref keyword and the optional portion of

the rule which recognizes the at <index> notation just before the "in" recognition.


effectList<out UnresolvedEffects e> =(. List<string> reads = null; List<string> writes = null; .)[IF (IsReads()) identifier (. reads = new List<string>(); .)"<"[ (. string i; .)contextName<out i> (. reads.Add(i); .){","contextName<out i> (. reads.Add(i); .)

}]

">"][IF (IsWrites()) identifier (. writes = new List<string>(); .)"<"[ (. string i; .)contextName<out i> (. writes.Add(i); .){","contextName<out i> (. writes.Add(i); .)

}]

">"] (. e = new UnresolvedEffects(reads ?? new List<string>() {"world" },writes ?? new List<string>() {"world" }); .)

.

Listing 6.3: Coco/R grammar production for a set of declared read and write effects.

6.3.2 Abstract Syntax Tree

The Abstract Syntax Tree (AST) is a tree representation of a program. The nodes

in the tree represent declarations, statements, and expressions. The design of the

AST in GPC] employs principles from aspect-oriented programming. As an example,

consider the AST node for an if statement. The IfStatement class derives from the

Statement class which in turn derives from the Node class. The IfStatement contains

constructors, properties, a type checking method, and a code generation method. To

help make the source easier to navigate and maintain, groups of semantically related

AST nodes are implemented in the same source code file. Further, we have used C]’s

partial classes to break the implementation of classes up into different source files by

the aspect. These aspects correspond to AST node construction, type checking, and

code generation. These different aspect specific source files are organized into sub-

directories based on the compilation stage they are associated with. This means, for

example, that all of the TypeCheck methods for statements are located in the same

subdirectory of the source tree. The key advantage of this is that when working on

the compiler, programmers do not need to sift through code unrelated to the phase or


foreachStatement<out Statement s> = (. TypeRef t; Token tok0 = la; .)"foreach" "(" (. bool refVar = false; .)["ref" (. refVar = true; .)]type<false, out t> (. string i; .)ident<out i> (. TypeRef indexType = null; string index = null; .)[ IF (IsAt()) identifiertype<false, out indexType>ident<out index>]"in" (. Expression e; .)expression<out e> ")" (. Statement s2; .)embeddedStatement<out s2>(. s = new EnhancedForeachStatement(tok0, t, i, e, s2, indexType,

index, refVar); .).

Listing 6.4: The modified Coco/R production for the enhanced foreach loop in Zal.

pass they are working on. Figure 6.2 shows a portion of the compiler source file system

structure.

Figure 6.2: An illustration of the Zal compiler source directory structure where classimplementation is split across source files using partial classes stored in subdirectoriesfor each stage of compilation.

The language extensions I have proposed involve several new syntactic features. These

syntactic features need to be represented as part of the AST and so several new AST

node types had to be added. The AST nodes added are listed in Table 6.2. The

implementation details of these different AST nodes are the focus of the remainder of

this section.


Node Type DescriptionContextFormals list of formal context parameters

(OwnershipContexts) attached to a type defini-tion

UnresolvedContextActuals a list of context names attached to a type referencewhich should resolve to OwnershipContexts duringtype checking

OwnershipContext a formal context parameter on a definition whichmay optionally have constraints in the form of aConstraintList

ConstraintList a set of constraints on a context parameterSubcontextDefinition a subcontext declared in a type definitionUnresolvedEffects a list of context names read and written attached to

a definition which needs to be resolved to form a setof ResolvedEffects during effect computation

ResolvedEffects a list of OwnershipContexts read and written pro-duced as a result of resolving the contextnames listedin an instance of UnresolvedEffects

EnhancedForeachStatement an enhanced foreach loop, derived from the exist-ing ForeachStatement, which stores the additionaloptional def keyword and index expressions

Table 6.2: The AST nodes added to represent the contexts, context constraints, effectdeclarations, and enhanced foreach loops.

6.3.3 Design of the Pluggable Type System

Starting with version 2, the C] language added support for generics. Adding type

parameters to type definitions and type references requires extensive compiler support

for checking and validation. The compiler infrastructure required to check and validate

generic type parameters is very similar to the infrastructure required to check and

validate other kinds of parameters on types including context parameters.

The term Pluggable Type System was first proposed by Bracha [12]. Bracha uses the

term to mean a system which supports the implementation of a number of optional type

systems. These optional type systems can be used to prove program properties. In his

definition, Bracha specifically mandates that an optional type system must not affect

the runtime semantics of the programming language to which it is applied. Generic

types are one notion of a parameterized type; generic types are types parameterized

with types. Abstracting the framework for generics in the C] language can, therefore,

provide a framework for the implementation of a pluggable type system on top of C].

It is important to note that the implementation of ownership types I have chosen does


not strictly adhere to Bracha’s requirement that a optional type system not affect

the runtime semantics of the language it is applied to. The handling of statics in

conjunction with ownerships is discussed in Section 4.4. The value read from a static

field is affected by the context parameters supplied. None-the-less, abstracting the type

parameter framework so that it can be used to implement both generics and my type

and effect system would produce a framework that could also be used to implement a

truly pluggable type system.

When we were writing the C] compiler (see Section 6.2), the overall design of the plug-

gable type system infrastructure was not known. Generics were initially implemented

in the compiler. The implementation of generics was then refactored several times dur-

ing the process of adding context parameters to types. During these refactorings, areas

of common functionality were extracted and abstracted. When behavior specialization

of abstracted functionality was required, extension points were added to existing in-

terfaces to allow for this. The end result is the design presented in this section. The

presentation of the design which follows mirrors the development process which took

place; this structure makes the design process and choices made easier to explain. I

begin by discussing the implementation of generics and then abstracting from generics

the infrastructure to support arbitrary type parameters.

6.3.3.1 Generic Types

Generic types are types that are parameterized with one or more formal type param-

eters. The actual types which can be supplied for a given formal type parameter may

be constrained using a constraint clause. Listing 6.5 shows the declaration of a generic

type and the use of a generic type.

public class OrderedList<T> where T : IComparable<T> {...

}...OrderedList<int> intList = new OrderedList<int>();

Listing 6.5: An example of the declaration and use of generic types in C].

In the example shown in Listing 6.5, the type OrderedList has a single formal type

parameter T. The where clause on the OrderedList type stipulates that any type


supplied must for T must implement the IComparable interface. The local variable

declaration of intList constructs a type reference to the OrderedList with an actual

type parameter of int.

Figure 6.3 shows the AST subtree generated for the class declaration shown in List-

ing 6.5. There are five types of AST nodes which appear in Figure 6.3. Brief descriptions

of these node types are shown in Table 6.3.

Node DescriptionStructuredTypeDefinition Represents a user defined type such as a class, struct,

enum, or interface.GenericFormals A list of formal context parameters which is

attached to a StructuredTypeDefinition orMethodDefinition.

TypeParameter a formal context parameterTypeParameterConstraint A constraint on a formal type parameter; there may

be multiple constraints on a single parameter.UnresolvedGenericActuals A list of TypeReferences to be used as actual type

parameters to be bound to the formal type param-eters of the invoke expression or type reference towhich it is attached.

TypeReference A name reference to a type definition; this is whatmost programmers would usually think of as a typename in an informal manner.

Table 6.3: Descriptions of the AST node types which appear in Figure 6.3.

Formal generic type parameters on a definition are represented by TypeParameter

nodes stored in a GenericFormals node attached to the definition. When a type is con-

structed, as happens in the type constraint clause on the type definition shown in List-

ing 6.5, the constructed type is represented by a TypeReference node. TypeReferences

are resolved to TypeDefinitions during type checking. Actual type parameters sup-

plied on a TypeReference are represented by a list of TypeReference nodes stored in

an UnresolvedGenericActuals node attached to the original type reference.

6.3.3.2 Extracting an Abstract Type Parameter Infrastructure

With the discussion of how generics were initially implemented in the compiler com-

plete, I now proceed to discuss how the framework was abstracted to support arbitrary

type parameters. There are two key AST nodes in the implementation of generics in the

compiler: GenericFormals, UnresolvedGenericActuals, and ResolvedGenericActuals.


Figure 6.3: This figure shows the AST subtree generated by our C] compiler for theclass declaration shown in Listing 6.5.


Conceptually GenericFormals represent a list of formal type parameters, each with an

associated list of type constraints and UnresolvedGenericActuals represent a list of

actual type parameters. ResolvedGenericActuals is used to store the definitions that

actual context parameters resolve to during type checking. This is fundamental to the

operation of the generics implementation even though it is not shown on the figure.

The nodes which represent the formal and actual parameters themselves, rather than

the lists, are not as important because all of the resolution and validation operations

on a list of parameters are performed by the list holding the parameters. Making this

distinction ensures that any AST nodes desired can be used as parameters to types;

the abstraction will apply only to the containers for these parameters.

To abstract the parameterization of types, it was, therefore, necessary to abstract the

three node types outlined above. The next question was how to abstract these node

types. One option would be to abstract all three node types as C] abstract classes, but

this would limit the flexibility of design since parameters could not inherit from other

classes such as Node. With this in consideration, the decision was taken to abstract the

actual lists of parameters using C] interfaces to maximize flexibility. The constraints

are a rather more distinct entity in the AST and so I chose to abstract them using an

abstract class initially as converting the abstract class into an interface at a later date

would be trivial. Table 6.4 shows the abstract class and interfaces I wrote to abstract

the implementation of type parameters.

Type DescriptionITypeFormals Represents a list of formal parameters on a type defini-

tion.IUnresolvedTypeActuals Represents a list of actual parameters on a type refer-

ence before their being resolved.IResolvedTypeActuals Represents a list of actual parameters on a type refer-

ence after they have been resolved to their definitions.

Table 6.4: The interfaces written to provide an abstract structure for the implementa-tion of type parameters.

In addition to the abstractions above, there needs to be infrastructure added to allow

types to be parameterized with multiple different types of parameters. For example,

when my ownership extensions are in use, a type may be parameterized with both type

and context parameters.


Without additional infrastructure, different parameters on a type or method would need

to be aware of the other types of parameters that may also be present on the type.

To create a maximally flexible framework that would allow parameters to be added

in a modular manner, there needs to be some additional infrastructure to abstract

the parameters’ implementations and associated algorithms from one another. This

infrastructure would give each set of parameters the appearance of being the only

parameters on a type. The abstraction could be violated if necessary, but would simplify

implementation of the most common case.

To achieve the desired abstraction, I wrote three classes to store lists of formal param-

eters, unresolved actual parameters, and resolved actual parameters. Table 6.5 shows

the types for these amalgamated sets of parameters. These amalgamated parameter

implementations implement the standard parameter interfaces shown in Table 6.4. The

implementations simply invoke the same interface method on all of the parameter lists

contained in the amalgamation. Figure 6.4 shows the AST subtree for the type def-

inition shown in Listing 6.5 with the addition of these amalgamated type parameter

wrappers.

Type DescriptionAmalgamatedFormals Wraps a list of ITypeFormals; that is it is a list

of formal type parameter lists.AmalgamatedUnresolvedActuals Wraps a list of IUnresolvedTypeActuals; that

is a list of unresolved actual type parameters ona type reference.

AmalgamatedResolvedActuals Produced by resolving the type paramter lists inan AmalgamatedUnresolvedActuals.

Table 6.5: The classes used to wrap up lists of type parameters so that different pa-rameter lists do not need to be aware of one another and so that parameters can bechecked and resolved collectively.

Having created these wrappers, deciding what methods needed to be implemented

by these abstractions was deduced while implementing ownership context checking and

validation. The consistency and correctness of type references are determined when the

type reference is resolved to a type definition. Once the type reference has been resolved

to a type definition, the definition can be checked for assignment compatibility with

other types to ensure the overall correctness and consistency of the program. There are

a number of methods used to do this, but the key methods are summarized in Table 6.6.


Figure 6.4: This figure shows the AST subtree generated by our C] compiler for theclass declaration shown in Listing 6.5 after the amalgamated type parameter wrappershave been added.


Listing 6.6 shows the full declaration of the interfaces showing the functionality exposed.

6.3.4 Effect Calculation and Validation

The Zal compiler executes two passes over the Abstract Syntax Tree (AST) to compute

effects: one to compute heap effects and one to compute stack effects. These two effect

computation passes require different computation contexts and so are most logically

implemented separately. The first pass computes all of the heap side-effects of all the

expressions and statements in the program and validates the method effect signatures

against the computed effects of their respective bodies. The second computes the stack

effects for all the statements and expressions in the program.

6.3.4.1 Heap Effects

Heap effect computation is implemented recursively over the nodes of the AST. Each

node has a computeEffects method. This method recursively calls the computeEffects

on any child statements or expressions in the AST. Once that is done, the method com-

putes the overall effect for the node and stores it in the node’s sideEffects variable.

The caching of effects on the nodes is done so that effects do not need to be repeatedly

recomputed as different data dependency analyses are performed on the AST.

computeEffects takes three parameters: the current scope (used to resolve names and

types not cached during type checking when required), an effect computation environ-

ment which contains context relationship information as well as a reference to the type

definition the current code is in, and a boolean flag used to indicate if an expression is

being read or written (the flag is ignored when the effects of non-expressions are being

computed). The computeEffects methods returns the overall effect of executing the

subtree of the AST rooted at the node it is invoked on.

Listing 6.7 shows the computeEffects method for a BlockStatement. The block

statement effect computation computes the side-effects of all the statements it contains

and unions them together to produce the total effect of the statement block as shown

by the formal type rule in Figure 6.5. The condition checking for the Rewriting flag


Method DescriptionResolveNameHere Used to lookup formal type parameters as part of the

process of resolving unresolved references.UnfixedParameters Part of type parameter inference on method calls.InferredParameters Part of type parameter inference on method calls.SameActualsAs Used to test for type equality between two types pa-

rameterized with type parameters.CompatibleActualsWith Used to test for assignment compatibility between two

types parameterized with type parameters.Resolve Called to convert lists of unresolved parameters into

resolved parameters.

Table 6.6: Key methods from the interfaces used to abstract type parameters to createa pluggable type system.

public partial interface ITypeParameters {string mangle(string name);

}public partial interface ITypeFormals: ITypeParameters {

void TypeCheck(IScope scope, OverflowChecks check);void DefineNames();Definition ResolveNameHere(string name);TypeContainer ResolveTypeOrNamespaceHere(string name);IUnresolvedTypeActuals Unresolve();bool NotSpecified();IResolvedTypeActuals UnfixedParameters();IResolvedTypeActuals InferredParameters();IResolvedTypeActuals actuals {get; set; }bool SameActualsAs(ITypeFormals other);bool CompatibleActualsWith(ITypeFormals other);ITypeFormals Clone();List<ITypeFormals> FormalsOmittingInferrables();GenericFormals GetTypeParam();bool InstanceNeeded(IResolvedTypeActuals actuals);void CheckValid(IScope scope, Token pos);ResolvedEffects ResolveEffects(IScope scope);OwnershipContext GetOwner();

}public partial interface IUnresolvedTypeActuals: ITypeParameters {

IResolvedTypeActuals Resolve(IScope scope);IUnresolvedTypeActuals Clone();

}

public partial interface IResolvedTypeActuals : ITypeParameters {}

Listing 6.6: The key interfaces used to abstract the different kinds of type parameterlists and the methods declared on them.


is for the conceptual re-writing discussed in Section 6.3.4.3.

Γ; ∆ ` s : ϕ Γ; ∆ `{s}

: ϕ′

Γ; ∆ `{s; s

}: ϕ ∪ ϕ′ (Eff-Block)

Figure 6.5: The effect rule for a statement block.

public override ResolvedEffects computeEffects(IScope scope,OwnershipEnv env, bool writing) {ResolvedEffects sideEffects = EffectsFactory.createEmptyEffect();if (statementList != null)

foreach (Statement s in statementList)sideEffects = sideEffects.Union(env,

s.computeEffects(this, env, false));

return sideEffects;}

Listing 6.7: The method for computing the heap effect of a statement block.

The source of all heap effects are reads and writes of fields. A NameExpression

is the AST node which represents a field or variable name. Listing 6.8 shows the

computeEffects methods for NameExpressions. The method determines what the

name refers to and from that determines what the effect of reading that name should

be.

6.3.4.2 Stack Effects

Stack effects are computed recursively over the AST using the method LocalEffects

in a similar manner to the computation of heap effects using computeEffects. The

computation of stack effects is, generally, simpler than heap effects since contexts can

be aliased and can dominate one another while local variable names are unique and are

simply tracked as a set of variable names read and written. The most complex effect

computation steps are for member expressions and statement blocks.

Member expressions must account for the reading and writing of structs allocated on

the stack. The this context of these structs stored in the stack needs to be mapped to

an appropriate stack effect. The stack location read or written has a unique name, the

name of the local variable which stores it. If the struct were stored in a heap allocated


public override ResolvedEffects computeEffects(iscope scope, OwnershipEnvenv, bool writing) {

ResolvedEffects sideEffects = EffectsFactory.createEmptyEffect();if (d == null)

throw new SemanticError(pos, 0, "Can’t resolve name 0", name);

else if (d is FieldDefinition) {if (((FieldDefinition)d).Has(Modifiers.Readonly) ||

(!d.Has(Modifiers.Static) &&((FieldDefinition)d).outerTypeScope.kind == Kind.Struct))sideEffects = EffectsFactory.createEmptyEffect();

else if (!d.isStaticScope) {if (writing)

sideEffects = sideEffects.Union(env, new ResolvedEffects(new ResolvedContextActuals(new List<OwnershipContext>()),new ResolvedContextActuals(new List<OwnershipContext>() {

((FieldDefinition)d).computeContext(scope,env) })));else

sideEffects = sideEffects.Union(env, new ResolvedEffects(new ResolvedContextActuals(new List<OwnershipContext>() {

((FieldDefinition)d).computeContext(scope, env) }),new ResolvedContextActuals(new List<OwnershipContext>())));

} else {if (writing) {

sideEffects.WriteEffects.parameters.Clear();sideEffects.WriteEffects.parameters.Add(newOwnershipContext(null, ((NamedTypeRef)((FieldDefinition)d).outerTypeScope.ToTypeRef()).ToString(), true));

} else {sideEffects.ReadEffects.parameters.Clear();sideEffects.ReadEffects.parameters.Add(newOwnershipContext(null, ((NamedTypeRef)((FieldDefinition)d).outerTypeScope.ToTypeRef()).ToString(), true));

}}

else if (d is PropertyDefinition) {if (writing)

sideEffects = raiseEffects(env,((PropertyDefinition)d).ResolveSetEffects(),((PropertyDefinition)d).outerTypeScope, env.Rewriting);

elsesideEffects = raiseEffects(env,

((PropertyDefinition)d).ResolveGetEffects(),((PropertyDefinition)d).outerTypeScope, env.Rewriting);

}else

sideEffects = EffectsFactory.createWorldEffect();return sideEffects;

}

Listing 6.8: The computEffects method of the NameExpression AST node.


partial class MemberExpression {public override Effect<string> LocalEffects(IScope scope, bool writing)

{Effect<string> toReturn = e.LocalEffects(scope, false);if ((writing &&e.t is StructuredTypeDefinition &&((StructuredTypeDefinition)e.t).Kind == Kind.Struct) ||(Binding is MethodDefinition &&

((MethodDefinition)Binding).outerTypeScope.Kind == Kind.Struct &&((MethodDefinition)Binding).ResolveEffects().WriteEffects.parameters.Contains(ResolvedContextActuals.thisContext)))toReturn.Union(e.LocalEffects(scope, true));

return toReturn;}

}partial class NameExpression {

public override Effect<string> LocalEffects(IScope scope, bool writing){

Effect<string> effects = new Effect<string>();Definition d = name.ResolveNameAnywhere(scope);if (d is LocalDefinition || d is StructuredTypeDefinition) {

if (writing)effects.AddWrite(name.i);

elseeffects.AddRead(name.i);

}return effects;

}}partial class BlockStatement {

public override Effect<string> LocalEffects(IScope scope) {Effect<string> effects = new Effect<string>();

if (statementList != null) {foreach (Statement s in statementList)

effects.Union(s.LocalEffects(this));foreach (string varName in LocalsList()) {

effects.GetReadEffect().Remove(varName);effects.GetWriteEffect().Remove(varName);

}}return effects;

}protected List<string> LocalsList() {

List<string> locals = new List<string>();foreach (Statement s in statementList)

if (s is LocalVarDefStatement)locals.AddRange(((LocalVarDefStatement)s).GenSet());

return locals;}

}

Listing 6.9: The implementations of the local effect computation methodsLocalEffects on member and name expressions as well as blocks statements.


object, the this effect becomes the this context, a heap effect, of the containing object.

If it is nested inside another struct the effect continues to be propagated “up” until

either a heap allocated object or a local variable is reached. Listing 6.10 shows an

example of this. There are no heap allocated objects in the example and so there are

no heap effects. Notice that the stack effects calculated for the expressions computing

the rectangle height and width read only the r1 stack variable.

public struct Point {public int x;public int y;public Point(int x, int y) reads<> writes<> {

this.x = x; this.y = y;}

}public struct Rectangle {

public Point topLeft;public Point bottomRight;public Rectangle(Point topLeft, Point bottomRight) reads <> writes <> {

this.topLeft = topLeft;this.bottomRight = bottomRight;

}}...Point p1 = new Point(1,1); // writes p1Point p2 = new Point(3,4); // writes p2Rectangle r1 = new Rectangle(p1,p2); // reads p1,p2 writes r1int width = r1.bottomRight.x - r1.topLeft.x; // reads r1int height = r1.bottomRight.y - r1.topLeft.y; // reads r1

Listing 6.10: An example of the mapping of struct this contexts to stack variablesduring stack effect computation

Statement blocks remove local variables, whose scope is the block, from their effect

sets; this is done due to the lexical scoping of variables. The variables declared in a

block are internal details of the block, code outside of the block scope cannot access

the variables declared within the block. The removal of these variables from the block’s

effect set hides the block’s implementation details.

Listing 6.9 shows the implementation of the member expression, name expression and

block statement LocalEffects methods.

6.3.4.3 Loop Body Rewriting

The sufficient conditions developed for data parallel loops in Section 2.4.1.1, applied

to a simple foreach loop where a single method is invoked on the loop’s iteration


variable. To handle arbitrary loop bodies, I presented the idea of a conceptual rewriting

whereby the loop body was transformed into a method on the iteration variable. When

I presented this idea I emphasized that this was a conceptual rewriting and, that

in practice, the same results could be obtained by modifying the effect computation

algorithms. In this section I present how this conceptual rewriting was implemented in

the Zal compiler.

Heap effects are computed recursively over the AST using the computeEffects method.

The computeEffects method takes three parameters: the current scope of evaluation

(used to resolve names and types), an OwnershipEnv environment object, and a bool

flag which determines if the current expression is being used as an l-value or r-value. The

OwnershipEnv object is used to carry information down through the effect computation

process. It carries several different pieces of information:

• The current type being processed

• The current subroutine construct being evaluated (method, constructor property

accessor, etc)

• Context relationship information

• The current rewrite variable, if any

The OwnershipEnv is the key to the conceptual rewriting process. Unless an effect

computation for a conceptual rewriting is in process, the current rewrite variable is

unset. When a conceptual rewrite is needed, the following changes are made to a copy

of the current OwnershipEnv:

• set the rewrite variable type and name to that of the iteration variable

• clear any context relationships from the environment that pertain to the current

this context and add the element type’s owner as a dominator of this.

The OwnershipEnv method has a read-only property called Rewriting which tests if

the rewrite variable information has been set. This modified OwnershipEnv object is

then passed to the computeEffects method of the loop body. All of the expressions

which add contexts to the read and write effect sets check this property to see if a


rewrite operation is in progress. If a rewrite is in progress, the following changes are

made to the way effects are computed:

• Reads of fields via the this variable (directly or indirectly) cause a read of the

current type’s owning context, if there is one.

• Writes to fields via the this variable (directly or by implication) cause a write of

the current type’s owning context, if there is one.

• Reads of fields via the rewrite variable cause a read of the this context.

• Writes of fields via the rewrite variable cause a write of the this context.

• When a method is invoked, if it is invoked via the rewrite variable then the

this context is not replaced with the object owner, otherwise the this context is

replaced with the owner of the type on which the method was invoked, if there is

one. This is done since the this context is nameable only within the object whose

representation it holds.

This set of rules basically ensures that the effect set computed for the loop body is

expressed in terms of effects from the perspective of the iteration variable, just as

would be the case if the loop body was actually rewritten to be a method on the

iteration variable.

This same rewriting process can be used to compute the side-effects of extension meth-

ods since the effect of an extension method should appear to be effects on the extension

parameter object as was discussed in Section 4.3.2.7. There is one interesting corner

case to consider when conducting a rewrite of a loop in a struct, a value type. The

struct does not have an owner so the abstraction of reads and writes of fields via the

this variable cannot be abstracted to reads and writes of the type’s owning context.

To avoid missing dependencies created via the fields on the current value type, there

needs to be a check that the loop body does not write to the this context when the loop

body effects are computed normally. If such a write takes place, then a loop carried

dependence could exist which is prohibited by my sufficient conditions.

Note that user defined value types do not have owners. Since a rewrite of the loop hides

these effects, before parallelizing loops a check needs to be made in the user defined


value types that they do not write the this context when their effects are computed

normally. If they do, there is a loop carried dependence which is prohibited by my

sufficient conditions.


6.4 Zal Code Generation

I have written several libraries of data structures and helper methods which are used

by the Zal compiler’s code generation stage to facilitate the implementation of Zal

programs in C]. The most important of these data structures and methods are those

which facilitate runtime tracking of ownership and the implementation of types an-

notated with runtime ownership information. In addition to this core functionality, I

have also written libraries to support the implementation of enhanced foreach loops

in C] and the pipelining of foreach loops. This section discusses how the compiler im-

plements ownerships in C] during the BuildOwnershipImplementation pass and the

design of the runtime libraries used to support this implementation. It also discusses

other runtime libraries written to simplify other implementation steps performed by

the compiler.

6.4.1 Runtime Ownership Implementation and Tracking

Reasoning about inherent parallelism using effects expressed using contexts requires

there to be some basis for determining if two arbitrary contexts are disjoint. While

there are a number of language features that can be used to facilitate static reasoning

about this disjointness (see Section 3.4), there are still a number of cases where it is

desirable to perform runtime tests on the relationship between two arbitrary contexts

as was discussed in Section 3.4.4. This means that when programs written in Zal are

compiled to C], additional fields and methods need to be added to the emitted code to

track ownership at runtime. In this subsection I present the design and implementation

of this runtime ownership tracking system as well as that of the runtime libraries used

to facilitate this tracking and context relationship testing at runtime.

6.4.1.1 Ownership Implementation

The tracking of ownerships at runtime can be achieved through the addition of methods

and fields to the classes emitted by the Zal compiler. This implementation approach

allows ownership tracking to be implemented without modifying the Common Language

Runtime (CLR) although this may be preferable.


Because Zal is implemented in C] it is desirable to be able to tell which types in

a program support ownership tracking and which do not. To do this I introduced

the IOwnership interface. All classes emitted by the Zal compiler implement this

interface. This means objects modified to support the runtime tracking of ownerships

can be distinguished from those which do not by testing to determine if the IOwnership

interface is implemented. In cases where the interface is not implemented, the runtime

system uses safe default ownership and effect information.

The methods in the IOwnership interface are determined by the data structure being

used to store ownership information and provide the minimum functionality required

to traverse the ownership tree, and test contexts for disjointness as was discussed in

Section 3.4.4. In Section 3.4.4, several different structures were discussed including

simple parent pointers, Dijkstra views, and skip lists. Out of these different structures,

I have chosen to implement two ownership tracking systems. One of these implemen-

tations uses Dijkstra Views, where each object keeps an array of references to all of its

ancestors back up to the special top context world. The other uses the simple parent

pointer system where each object maintains a pointer to its parent. To help improve

the performance of runtime tests in the simple parent pointer system, I have elected to

cache the depth of each object in the ownership hierarchy on the object. This allows

a best and worst case complexity of O(n −m) where m and n are the depths of the

objects being compared.

To reduce the amount of code that needs to be emitted into each class and facilitate

easy modification of the ownership tracking system I chose to expose all necessary ob-

ject ownership tracking functionality through the IOwnership interface. This interface

allows library code to be used for most of the functionality with only accessor and mu-

tator methods required on the objects themeselves. I have created two versions of the

IOwnership interface, one for each of the two different systems I have implemented.

The Zal compiler can be toggled to produce C] code conforming to either of these

interfaces; the choice of which to use is made based on the ratio of object creations to

context relationship tests as I discuss later in this chapter.

The Dijkstra views implementation has the advantage of providing constant time run-

time tests for the relationship between contexts, but the disadvantage of incurring


object creation time and memory usage overheads proportional to the height of the

ownership tree. Objects tracking ownership information using Dijkstra views, there-

fore, needed to provide some means of (1) initially setting the array of ancestor contexts,

(2) getting the array of ancestors, (3) get the object’s immediate owner, and (4) the

current depth of the object in the ownership tree. Listing 6.11 shows the methods

of the Dijkstra View’s version of the IOwnership interface. Notice that the arrays of

ancestors operate on arrays of objects not arrays of IOwnership types. This is so that

ordinary C] objects can be made to supply the default owner context of world for the

purposes of effect calculation.

public interface IOwnership {object[] GetParents();void SetParents(object[] value);object Owner();int Depth();

}

Listing 6.11: The IOwnership interface implemented by all types emitted by the Zalcompiler when Dijkstra Views based tracking is selected.

The parent pointer based runtime system has the advantage of minimizing the memory

and object creation overheads incurred at the expense of slower context relationship

tests. In the system, objects must provide some means of (1) initially setting their

parent pointer and (2) getting their parent pointer. As an optimization, as previously

discussed, objects can also provide their depth in the ownership hierarchy. Listing 6.12

shows the methods of the parent pointer version of the IOwnership interface. As was

the case with the Dijkstra Views implementation, notice that the parent is an object

and not an instance of IOwnership.

public interface IOwnership {object Owner();void SetOwner(object value);int Depth();

Listing 6.12: The IOwnership interface implemented by all types emitted by the Zalcompiler when Dijkstra Views based tracking is selected.

In addition to implementing one of these IOwnership interfaces, classes emitted by the

Zal compiler also have fields added to store any additional context parameters declared

on the Zal class. No additional methods are required to manipulate these fields since


they are used only within the class’s implementation.

For the sake of brevity and clarity, the remainder of the implementation will be de-

scribed using the pointer based version of the IOwnership interface. The Dijkstra

Views based implementation can be trivially derived from the presented code and is

also available for download from [32].

So that the methods of the IOwnership interface can be called on any object, I wrote a

class of extension methods which call the appropriate ownership methods depending on

the type of the objects passed in. There is separate handling of IOwnership instances,

arrays, and non-IOwnership instances. The special handling of arrays is discussed

in detail in Section 6.4.1.4. Listing 6.13 shows the details of this class of extension

methods.

Once a Zal program has been compiled, other Zal programs may link against that

program to make use of one or more parts of its functionality. To facilitate reasoning

across compiled code modules, it is desirable to preserve formal context parameter and

effect declarations in an easily accessible format. This lets the Zal compiler read this

information from these compiled programs so that it can be used when compiling other

programs. In C], custom attributes provide one means of storing such information def-

initions. Table 6.7 lists the custom attributes I have written to store context parameter

and effect information on definitions. These attributes are emitted in the C] source

code produced by the Zal compiler.

In addition to the custom attributes, I have also written a class of extension methods

which are used to reduce the amount of code that the compiler needs to programmati-

cally inject into the generated code. Two items in this library are used as part of the

ownership tracking process and these are shown in Listing 6.14. The OwnershipWorld

type referred to in Listing 6.14 implements IOwnership and an instance of this type

is used to represent the special top context world at runtime. The depth of world is

always 0 and as the top context it does not have any parents and so returns an empty

array when asked for its parents and null when asked for its owner.

When an object is instantiated, actual context parameters are supplied for the type’s

formal context parameters. Context relationship testing may involve these context

parameters so they need to be captured as part of the object’s representation. To


public static class ObjectOwnershipExtensions {public static object Owner(this object obj) {

IOwnership ownedObj = obj as IOwnership;if (ownedObj != null)

return ownedObj.Owner();Array arrayObj = obj as Array;if (arrayObj != null)

return arrayObj.Owner();return OwnershipHelpers.world;

}

public static void SetOwner(this object obj, object owner) {IOwnership ownedObj = obj as IOwnership;if (ownedObj != null) {

ownedObj.SetOwner(ownr);return;

}Array arrayObj = obj as Array;if (arrayObj != null) {

arrayObj.SetOwner(owner);return;

}throw new OwnershipException("Cannot transfer the ownerhip of a

non-IOwnership non-Array type");}

public static int Depth(this object obj) {IOwnership ownedObj = obj as IOwnership;if (ownedObj != null)

return ownedObj.Depth();Array arrayObj = obj as Array;if (arrayObj != null)

return arrayObj.Depth();return 1;

}

public static object AddChild(this object o, object toAdd) {toAdd.Owner(o);return o;

}}

Listing 6.13: Extension methods on object which allow ownership properties to beread from any object and set on any object which supports it as implemented for theparent pointer version of the IOwnership interface.


Attribute UsageFormalContextParametersAttribute Stores a list of formal context parameters

declared on a type or method definition.ReadEffectAttribute Stores the declared read effects of a

method, constructor, or delegate defini-tion.

WriteEffectAttribute Stores the declared write effects of amethod, constructor, or delegate defini-tion.

GetReadEffectAttribute Stores the declared get accessor read ef-fects of a property or indexer.

GetWriteEffectAttribute Stores the declared get accessor write ef-fects of a property or indexer.

SetReadEffectAttribute Stores the declared set accessor read ef-fects of a property or indexer.

SetWriteEffectAttribute Stores the declared set accessor write ef-fects of a property or indexer.

Table 6.7: Table of custom attributes used to store declared context parameters andeffect information. These attributes are emitted into C] source code produced by theZal compiler.

public static class OwnershipHelpers {private static OwnershipWorld w;public static OwnershipWorld world {

get {if (w == null)

w = new OwnershipWorld();return w;

}}

}

Listing 6.14: The items in the OwnershipHelpers class which are used to facilitate theimplementation of ownership tracking.

do this, the compiler adds a private field to store each context parameter to the type

definition. The Zal compiler emits a constructor which supports the runtime tracking of

ownership and, optionally, an unmodified constructor for use by existing C] programs.

The ownership tracking version adds parameters to the constructor’s signature for

each of the type’s context parameters; context parameters are passed as objects at

runtime. When a new object instantiation is emitted by the compiler, it supplies the

actual context parameters as additional parameters to the constructor. The unmodified

version of the constructor for use by existing C] programs leaves all of the context

parameter fields set to their default initial value of the top context world. Finally, the

declared context parameters are stored on the emitted type definition using a custom


attribute.

Listing 6.15 shows a small example class written in Zal which has some formal context

parameters and a set of declared read and write effects on the constructor. It also shows

how the example class is implemented in C] by the Zal compiler using the previously

mentioned libraries, interfaces and code injection techniques.

// Zal class declaration

public class ExampleOwnershipClass[owner, data] {public ExampleOwnershipClass(int value) reads <> writes <> {

...}

}// C] implementation of the above class declaration

[FormalContextParameters("owner", "data")]public class ExampleOwnershipClass : IOwnership {

private IOwnership Context owner;private IOwnership Context data;[ReadEffect(), WriteEffect()]public ExampleOwnershipClass(int value) {

Context owner = OwnershipHelpers.world;Context data = OwnershipHelpers.world;

OwnershipHelpers.world.AddChild(this);...

}[ReadEffect(), WriteEffect()]public ExampleOwnershipClass(int value, IOwnership Context ownerparm,

IOwnership Context dataparm) {Context owner = Context ownerparm;Context owner.AddChild(this);Context data = Context dataparm;

...}

}

Listing 6.15: The implementation of a Zal class, with context formal context pa-rameters and declared constructor effects, in C] using custom attributes and theOwnershipHelpers library of helper methods.

Method formal context parameters are transformed into call parameters and local vari-

ables by the Zal compiler. As was the case with type constructors, the compiler can emit

two versions of all methods annotated with formal context parameters. One method

has the same list of call parameters as the original method declaration. This method

adds local variables at the start of the method body which correspond to the declared

method formal context parameters, if any. This is done so that even if the method

is called from code that is not ownership aware, ownership aware code called by the

method is supplied reasonable default ownership information. The second version of

the method generated adds the method formal context parameters as call parameters


so they can be supplied at method invocation time. An example of this implementa-

tion is shown in Listing 6.16. It is interesting to note that because no constructors are

declared for the MethodExample class, the compiler emits a default constructor so that

the class’s formal context parameters can be handled as previously discussed.

// An example of a method with formal context parameters written in Zalpublic class MethodExample[owner] {

public void operation[contextParm](Object|contextParm| value)reads <this, contextParm> writes <> {...

}}// The above example as implemented in C] by the Zal compiler[FormalContextParameters("owner")]public class MethodExample : IOwnership {

private IOwnership Context owner = OwnershipHelpers.world;

[ReadEffect(), WriteEffect()]public MethodExample(IOwnership Context ownerparm) : base() {

Context owner = Context ownerparm;}[FormalContextParameters("contextParm")][ReadEffect("this", "contextParm"), WriteEffect()]public void operation(Object value, IOwnership Context contextParm) {

...}...

}

Listing 6.16: The implementation of a Zal method with declared formal context pa-rameters and effects, in C] using custom attributes and OwnershipHelpers.

It is important to note that when constructors or methods declare side-effects, the

compiler will emit ReadEffects and WriteEffects annotations with these declared

effects. Even when constructors and methods do not declare their effects, the compiler

emits the effect annotations with the computed read and write effects of the constructor

or method. This is done so that the compiler can read in effect information from the

Zal DLLs during compilation rather than trying to recompute the effects.

6.4.1.2 Properties & Indexers

As was discussed in Section 4.3.2.1 and Section 4.3.2.2, properties and indexers are

both syntactic sugar for methods. Unlike methods, neither indexers nor properties

can be parameterized with formal context parameters. However like methods, the Zal

compiler emits either the declared, when present, or computed effects for the get and


set accessors in properties and indexers.

C] does not support attributes on accessors and so their effect information needs to be

stored in attributes on the accessor’s containing indexer or property. This means that

an indexer or property may be parameterized by two different sets of read and write

effects, one each for its get and set accessors. To distinguish the effect sets for the get

accessor from those of the set accessor, I have extended the custom attributes used

to store method effects (ReadEffects and WriteEffects). The derived extensions do

not modify the functionality of the originally declared ReadEffects and WriteEffects

attributes, but the different types are used to distinguish the effect declarations. The

GetReadEffects and GetWriteEffects attributes store the side-effects of the get acces-

sor while the SetReadEffects and SetWriteEffects do the same for the set accessor.

Because indexers and properties are not parameterized with formal context parameters,

the effects can be named only by using the formal context parameters available in the

surrounding type definition scope.

Listing 6.17 shows a Zal class with a property that has declared read and write effects

as well as how that class would be implemented by the compiler using the custom

attributes discussed. Notice the four effect attributes which correspond to the read

and write effects of the get and set accessors since C] does not allow accessors to have

attributes. The effect declaration syntax and attributes for indexers are the same as

those shown for the property.

// An example of a Zal class with a property

public class PropertyExample[owner, data] {public int Value {

get reads <this,data> writes <> { ... }set reads <this> writes <data> { ... }

}}// The above example as implemented in C][FormalContextParameters("owner", "data")]public class PropertyExample : IOwnership {

[GetReadEffects("this", "data"), GetWriteEffects(),SetReadEffects("this"), SetWriteEffects("data")]public int Value {

get { ... }set { ... }

}}

Listing 6.17: An example of the implementation of a Zal property in C].


6.4.1.3 Sub-contexts

Like other contexts in the system, sub-contexts are represented by objects at run-

time. Because sub-contexts do not correspond naturally with existing objects in the

program, special objects need to be instantiated to represent them. The runtime owner-

ship library provides the SubContext class which implements the IOwnership interface.

Instances of this class store only the array of ancestor references and provide a min-

imal implementation of the interface. Listing 6.18 shows the implementation of the

SubContext class.

public class SubContext : IOwnership {private object parent;

public SubContext(object owner) {owner.AddChild(this);parent = owner;

}

public object Owner() {return parent;

}

public void SetOwner(object owner) {parent = owner;

}

public int GetDepth() {return parent.GetDepth() + 1;

}}

Listing 6.18: The implementation of the SubContext class which is used to representsub-contexts declared in type definitions.

In a type which declares sub-contexts, each sub-context is emitted as an instance field

which is initialized to a new instance of the SubContext class. The sub-context can

then be passed around or used in relationship tests just as with any other context.

Listing 6.19 is an example of a node in a binary tree which has two sub-contexts to

hold the representations of the left and right branches respectively. When compiled to

C], notice that the sub-context declarations become fields which are set to new instances

of the SubContext type. The subcontext objects are then passed to constructors as

with any other context.


// a binary tree node implementation in Zal

public class BinaryTreeNode<T>[owner] {subcontexts l, r;private BinaryTreeNode<T>|l| left;private BinaryTreeNode<T>|r| right;private T data;

public BinaryTreeNode(T data) reads <> writes <> {this.data = data;

}public void addLeft(T data) reads <l> writes <> {

left = new BinaryTreeNode<T>|l|(data);}public void addRight(T data) reads <r> writes <> {

right = new BinaryTreeNode<T>|r|(data);}...

}// implementation of the above class in C][FormalContextParameters("owner")]public class BinaryTreeNode<T> : IOwnership {

private IOwnership Context owner = OwnershipHelpers.world;private IOwnership Context l;private IOwnership Context r;private BinaryTreeNode<T> left;private BinaryTreeNode<T> right;private T data;

[ReadEffect(), WriteEffect()]public BinaryTreeNode(T data, IOwnership Context ownerparm) {

Context owner = Context ownerparm;Context l = new SubContext(this);Context r = new SubContext(this);

this.data = data;}[ReadEffect(), WriteEffect("l")]public void addLeft(T data) {

left = new BinaryTreeNode<T>(data, Context l);}[ReadEffect(), WriteEffect("r")]public void addRight(T data) {

right = new BinaryTreeNode<T>(data, Context r);}...

}

Listing 6.19: An example of the use of sub-contexts as part of the implementation of abinary tree node.


6.4.1.4 Arrays

All arrays in C] are instances of the System.Array type. This type does not implement

the IOwnership interface since it is supplied as part of the .NET Platform. As discussed

in Section 4.3.3.4, each array in Zal can have an owning context parameter. If no owner

is specified, the array defaults to being owned by the world context. The owner of array

objects need to be tracked at runtime as does any other object with formal context

parameters. Since modifying the implementation of the System.Array type directly

is not an option, I have written a static helper class, ArrayOwnershipExtensions, to

store each array’s ownership information.

The static helper class, ArrayOwnershipExtensions, maintains a dictionary which

maps array objects to their respective owning contexts. To avoid having this helper

class cause memory leaks by holding on to array references, the dictionary used is a

custom written WeakDictionary which stores its keys using weak references which do

not contribute to an object’s incoming reference count for garbage collection purposes.

The helper class also supplies a number of extension methods which add the methods

of the IOwnership interface to the System.Array type. This allows the methods of the

IOwnership interface to be invoked on arrays. Listing 6.20 shows the implementation

of the ArrayOwnershipExtensions class.

public static class ArrayOwnershipExtensions {private static WeakDictionary<Array, object[]> dictionary = new

WeakDictionary<Array, object[]>(new ArrayComparator);

public static void Owner(this Array array, object value) {dictionary[array] = value;

}public static int Depth(this Array array) {

return dictionary.ContainsKey(array) ?dictionary[array].Length : 1;

}public static object Owner(this Array array) {

return dictionary.ContainsKey(array) ?dictionary[array][dictionary[array].Length - 1] :OwnershipHelpers.world;

}}

Listing 6.20: The static ArrayOwnershipExtensions class which stores ownership in-formation for arrays and provides access to this information via extension methods onthe System.Array which provide the same functionality as required of types which im-plement the IOwnership interface as implemented for the parent pointer based system.


Listing 6.21 shows an example of an array in Zal which has an ownership parameter

and how that is implemented by the compiler. The special handling of the arrays

shown in Listing 6.13 is necessary because the System.Array type does not implement

IOwnership even though it supplies the methods required. Although extension methods

allow methods to be added to an existing type, they do not allow additional interfaces

to be implemented. The special handling is, therefore, required to invoke the array

versions of the IOwnership methods.

public class ArrayExample[owner] {public Object|owner|[]|this| value;public ArrayExample() reads <> writes <> {

value = new Object|owner|[5]|this|();}

}[FormalContextParameters("owner")]public class ArrayExample : IOwnership {

private IOwnership Context owner = OwnershipHelpers.world;public Object[] value;[ReadEffects(), WriteEffects()]public ArrayExample() {

value = new Object[5]();this.AddChild(value);

}[ReadEffects(), WriteEffects()]public ArrayExample(IOwnership Context ownerparm) {

Context owner = Context ownerparm;value = new Object[5]();this.AddChild(value);

}...

}

Listing 6.21: An example of the creation of an array object in Zal and how the ownershipof the array is set. Notice that the standard AddChild method is called to set theownership of the array. The add child method calls the Object SetParents methodwhich in turn calls the SetParents extension method on the System.Array type.

6.4.1.5 Statics

The handling of static fields, methods and properties is one of the most complicated

aspects of the runtime ownership system. The syntax and semantics of the various

static language features in Zal was discussed in Section 4.4. This section focuses on

how the syntax and semantics of Zal can be implemented in the C] code emitted by

the compiler.

Fields


In C] static fields on generic types are specific to the instance of the generic type named

since the generic type parameters can be used to construct the types of static fields.

For consistency, I allowed context parameters on classes to be used to construct the

types of static fields. This means that it is necessary to make static fields specific to

the instance of the type named; a different static field is associated with each set of

actual context parameters.

Because C] does not have any context parameters, it is necessary to implement static

fields on classes with context parameters as methods which accept contexts, in the form

of IOwnership objects, which are then used to lookup or set the value of the static

field for the particular context parameters supplied. All of the different static fields

are stored in a Map where the key is the list of context parameters supplied and the

value is the value of the static field for the given contexts. Because of this choice of

implementation, Zal prohibits the use of static fields with ownership information as

ref parameters since the implementation I have chosen is not compatible. Other more

complex implementations could overcome this limitation.

To help implement static fields, I wrote a special ContextMap class which is a specialized

Dictionary. It accepts a list of context parameters as a key to lookup and set values in

the dictionary. The implementation of the ContextMap class is shown in Listing 6.22.

The key to the implementation is ContextStruct’s override of Equals.

Each static field in a class parameterized with context parameters is converted into a

static ContextMap. Reads of the field are transformed into calls to a static accessor

method which accepts a list of context parameters and returns the field value stored

in the ContextMap for the particular set of context parameters supplied. Similarly,

writes are transformed into calls to a modifier method which accepts a list of context

parameters and a value and it stores the specified value in the ContextMap under the

list of context parameters supplied. An example of the implementation of a static field

is shown in Listing 6.23. In this example, the static field is public so the field is retained

in the implemented version of the code for backwards compatibility reasons, but the

compiler will always use the get and set methods when the field is accessed with actual

context parameters supplied on the type.

Methods


public class ContextMap<T> {private struct ContextStruct {

IOwnership[] contextParams;

public ContextStruct(params IOwnership[] contexts) {contextParams = contexts;

}public override bool Equals(object obj) {

if (obj is ContextStruct) {if (contextParams.Length !=

((ContextStruct)obj).contextParams.Length)return false;

for (int i = 0; i < contextParams.Length; ++i)if (!contextParams[i].isSameAs(

((ContextStruct)obj).contextParams[i]))return false;

return true;}return false;

}}private Dictionary<ContextStruct, T> values;

public ContextMap() {values = new Dictionary<ContextMap<T>.ContextStruct, T>();

}public T this[params IOwnership[] contexts] {

get {return values[new ContextMap<T>.ContextStruct(contexts)];

}set {

values[new ContextMap<T>.ContextStruct(contexts)] = value;}

}public bool Contains(params IOwnership[] contexts) {

return values.ContainsKey(new ContextMap<T>.ContextStruct(contexts));}

}

Listing 6.22: The implementation of the ContextMap data structure used to implementstatic fields in classes parameterized by context parameters.

Static methods, like static fields, may make use of the formal context parameters of their

containing type just like instance methods. Instance methods are able to read the type’s

actual context parameters from the instance fields which were initialized when the type

was instantiated. The static methods do not have access to these fields since they are not

associated with a particular instance of the class. This means that the actual context

parameters specified on the type construction on which the static method is invoked

need to passed through to the method body in addition to any method-level context

parameters. Listing 6.24 shows how a static method is implemented in C] by the Zal

compiler. Notice that two versions of the static method are generated. The first version


public class Example[owner] {public static Example|owner| val;

}[FormalContextParameters("owner")]public class Example {

public static Example val;private static ContextMap<Example> val value =

new ContextMap<Example>();public static Example get static val(IOwnership Context owner) {

if (! val value.Contains( Context owner)) {return default(Foo);

return val value[ Context owner];}public static void set static val(Example value,

IOwnership Context owner) {val value[ Context owner] = value;

}private IOwnership Context owner = typeBlueOwnershipHelpers.world;public Example(IOwnership Context ownerparm) {

Context owner = Context ownerparm;} ...

}

Listing 6.23: An example of the implementation of a Zal static field in C]. Noticethat because the static field could be accessed from outside the class the plain field isretained for backwards compatibility, but that getter and setter methods are suppliedfor context aware code.

has no context parameters and is implemented for backwards compatibility purposes.

The second has call parameters for the enclosing type’s actual context parameters as

well as any call parameters required for the method’s context parameters.

public static class Example[data] {public static void operation() {

...}

}[FormalContextParameters("data")]public static class Example {

[ReadEffects(...), WriteEffects(...)]public static void operation(IOwnership Context owner) {

...}

}

Listing 6.24: An example of how a static method is implemented in C] when thecontaining type has formal context parameters.


Properties

Properties are C] syntactic sugar for accessor and mutator methods which allow the

methods to be invoked using a field style notation. Static properties are, therefore, syn-

tactic sugar for static methods. These static properties, like static methods, can make

use of the context parameters from the enclosing type. As was the case with methods,

these context parameters need to be marshalled into the static “method” by way of ad-

ditional call parameters. Unfortunately, properties do not support explicit parameters.

This means that static methods need to be written to mirror the functionality of the

static property so that the context parameters can be marshalled through. Listing 6.25

shows an example of a static property written in Zal and its implementation in C].

public static class Example[data] {public static Example|data| Instance {

get { ... }set { ... }

}}[FormalContextParameters("data")]public static class Example {[GetReadEffects(...), GetWriteEffects(...)]public static Example get Instance(IOwnership Context data) {...}[SetReadEffects(...), SetWriteEffects(...)]public static void set Instance(Example value,

IOwnership Context data) {...}}

Listing 6.25: An example of the implementation of a Zal static property in C]. Theoriginal property can be optionally retained for use by existing C] programs, but isomitted for clarity from the listing above. The get and set methods are used by own-ership aware code to marshall context parameters to the accessor implementations.

6.4.2 Enhanced Foreach Loop

The foreach loop in C] operates only on collections and other data sources which im-

plement the IEnumerable interface. In Section 2.4.1.2 I proposed an enhanced foreach

loop which exposes the ability to update items in collections being traversed as well

as providing access to an index of traversal which represents where in the collection

the current item being processed is located. Like the traditional foreach loop, the en-

hanced foreach loop can operate only on data sources and collections which implement

a specific interface used to implement the loop during code generation. This section


describes the implementation of enhanced foreach loops and the interfaces involved.

The body of an enhanced foreach loop can be thought of as an anonymous method

which accepts as parameters the element being processed and, optionally, the index of

the element being processed. The enhanced foreach loop has two optional features:

the ref keyword which enables write-through updates of the collection element, and

the loop index. This means there are four combinations of these features which produce

four different loop body method signatures as shown in Table 6.8. There are three loop

body delegates which need to be accepted by enhanced foreach loop implementations.

In the case where neither of these optional enhancements is employed, the loop is a

traditional foreach loop and can be implemented as usual without any further special

handling.

ref option index option DelegateX X void Body<T, I>(ref T element, I index)X void BodyNoIndex<T>(ref T element)

X void BodyNoRef<T, I>(T element, I index)void BodyStandard<T>(T element)*

Table 6.8: Enhanced foreach loop body delegates based on the optional enhancementsdeclared in the loop header. *note that the loop body without either of the optionalenhancements is a traditional foreach loop and can be handled using the IEnumerableinterface as usual.

In order for a collection to support the enhanced foreach loop, it needs to implement

at least one of the three methods shown in the list below. Only the methods sufficient

for the specific use of the collection in question are required.

• void EnhancedLoop<T, I>(Body<T, I> body)

• void EnhancedLoop<T>(BodyNoIndex<T> body)

• void EnhancedLoop<T, I>(BodyNoRef<T, I> body)

These methods execute the loop sequentially, and ensure the semantics expected by

the body delegate, with regard to updates to the element being processed. If the ref

keyword is present on the element parameter, assignment to the element passed to the

body delegate should be reflected in the collection.

To support parallel execution of an enhanced foreach loop, there are three additional


methods which a collection may implement. As with the EnhancedLoop methods, only

those needed for the specific use of the collection are required.

• void ParallelEnhancedLoop<T, I>(Body<T, I> body)

• void ParallelEnhancedLoop<T>(BodyNoIndex<T> body)

• void ParallelEnhancedLoop<T, I>(BodyNoRef<T, I> body)

These methods should execute the loop in parallel and respect the semantics expected

by the body delegate with regards to element updates.

I provided implementations of all six of these methods for the IList and IDictionary

interfaces as part of the enhanced foreach loop library. In Listing 6.26 I show two of

the EnhancedLoop implementations to show how these methods are implemented for

these collections.

public static void EnhancedLoop<T,I>(this IList<T> collection, Body<T, I>body) {

for (int i = 0; i < collection.Count; ++i) {T tempElem = collection[i];body(reftempElem, i);collections[i] = tempElem;

}}public static void EnhancedLoop<K,V>(this IDictionary<K, V> dictionary,Body<V, K> body) {

foreach (K key in dictionary.Keys) {V value = dictionary[key];body(ref value, key);dictionary[key] = value;

}}

Listing 6.26: Sample implementations of EnhancedLoop for the IList and IDictionaryinterfaces.

The enhanced foreach loop library I have written also supplies two different interfaces

which new collections or data sources may choose to implement (IEnhancedEnumerable

and IIndexedEnumerable). Both of these interfaces have the full set of EnhancedLoop

and ParallelEnhancedLoop methods implemented. The difference between these in-

terfaces is how the index values are generated. Listing 6.27 shows these two interfaces

and Listing 6.28 shows sample of how the EnhancedLoop methods for these interfaces

are implemented.


public interface IEnhancedEnumerable<I, T> {IEnumerable<I> GetIndices();IEnumerable<T> GetValues();void SetValue(I index, T value);

}public interface IIndexedEnumerable<T> {

int Start { get; }int End { get; }

T this[int index] {get;set;

}}

Listing 6.27: The two interfaces supplied with the enhanced foreach loop library whichcollections can implement so they can be used with the enhanced foreach loop.

6.5 Parallelization

The goal of the entire type and effect system implemented by the compiler I have written

is reasoning about parallelism. In Section 3.5, I realized sufficient conditions for the safe

application of three different parallelism patterns in terms of context relationships. The

three patterns discussed were task parallelism, data parallelism, and loop pipelining.

In this section, I examine how the compiler tests for the sufficient conditions for these

patterns, how it generates runtime context relationship checks to facilitate conditional

parallelism when required relationships cannot be statically verified, and how and when

the different parallelism patterns are applied.

6.5.1 Context Relationship Testing

When testing the relationship between two contexts at compile-time the result of the

comparison may be one of three results: the relationship is known to hold, the rela-

tionship is known not to hold, or the relationship is unknown. When a parallelization

condition, tested by the compiler, contains at least one relationship test which evaluates

to unknown, it may be desirable to build a set of context relationship conditions, which

if satisfied at runtime, would allow the parallel version of the code to execute safely.

In the compiler, I have constructed a data type, ConstraintList, which stores con-

text relationship information. This data type can be used to store either relationships

which are known to hold, or relationships which would be sufficient to allow for the safe


public static void EnhancedLoop<T, I>(this IEnhancedEnumerable<I,T> source,Body<T, I> body) {

IEnumerator<I> indexItr = source.GetIndices().GetEnumerator();IEnumerator<T> valuesItr = source.GetValues().GetEnumerator();while (indexItr.MoveNext() && valuesItr.MoveNext()) {

T temp = valuesItr.Current;body(ref temp, indexItr.Current);source.SetValue(indexItr.Current, temp);

}}public static void ParallelEnhancedLoop<T>(this IIndexedEnumerable<T>source, Body<<T>, int> body) {

Parallel.ForEach(collection.getIndices(), (I index) => {T temp = source[i];body(ref temp, i);source[i] = temp;

});}

Listing 6.28: Sample implementations of the EnhancedLoop method for the li-brary supplied IEnhancedEnumerable and the ParallelEnhancedLoop method forIIndexedEnumerable. The ParallelEnhancedLoop makes use of the Microsoft TaskParallel Library (TPL) parallel foreach loop implementation (see Section 6.5.2.1).

application of a parallelizing code transformation. I chose not to store this informa-

tion within Context objects because relationships between contexts may vary between

different lexical scopes within a user defined type. This design makes handling these

changing relationships more straightforward in the compiler.

Listing 6.29 shows the methods on the ConstraintList. Note that if a relationship

is added to a ConstraintList which would violate an existing constraint then the

relationships is said to be unsatisfiable. If this happens at runtime, then the program

has reached an inconsistent state and an appropriate error is generated by the runtime

system.

The most interesting of the methods in the interface shown in Listing 6.29 is the

conditionallyExecute method. This method generates conditionally parallel imple-

mentations of the parallelism patterns recognized by the compiler.

During effect computation, an OwnershipEnv object is passed into the heap effect

computation methods to provide context information to the effect computation pro-

cess. One of the pieces of information the OwnershipEnv carries is all statically known

context relationships. This context information is used when determining the relation-

ships between two or more ownership contexts. If one or more of the relationships

required by the sufficient conditions is not known to hold in the OwnershipEnv then a


public class ConstraintList {// context relationship tests

public bool IsDominatedBy(Context dominee, Context dominator);public bool IsDominatedByOrEq(Context dominee, Context dominator);public bool IsDisjointFrom(Context context1, Context context2);public bool IsEqual(Context context1, Context context2);// add context relationships to the constraint list

public bool AddDomination(Context dominee, Context dominator);public bool AddDominationOrEquality(Context dominee, Context dominator);public bool AddEquality(Context context1, Context context2);public bool AddDisjoint(Context conext1, Context context2);// generate runtime context relationship tests

public Statement conditionallyExecute(Statement parallelVersion,Statement sequentialVersion);

}

Listing 6.29: The ConstraintList interface showing context relationship addition,testing, and runtime test generation methods.

ConstraintList object is constructed to hold the conditions which need to be tested

at runtime. The basic pattern for using the ConstraintList is as follows:

1. Test if the context relationship(s) called for by the sufficient conditions is known

to hold based on the current OwnershipEnv environment.

2. If the context relationship is unknown, add it to the ConstraintList.

3. If an exception is thrown when adding the relationship, the sufficient condition

cannot be satisfied

Once all context relationships for a sufficient condition have been met or added to the

ContextList, the conditionallyExecute method can be used to generate code to test

the context relationships at runtime to see if the relationships hold.

The conditionallyExecute method takes as parameters, two different statements:

the parallel and sequential versions of the code. The method takes all of the con-

text constraints stored in the ConstraintList and generates an if statement with a

condition testing to see if the stored relationships hold. If they hold at runtime, the

parallelVersion will be executed otherwise the sequentialVersion will be run. The

runtime context relationship tests are made at runtime using one of several different

extension methods which each test a specific context relationship. The calls to these

methods are emitted by the compiler as part of the condition on a conditionally paral-

lelized code block. Listing 6.30 shows these operators as implemented in the runtime


ownership library.

public static bool isDominatorOf(this object lhs, object rhs) {int lhsDepth = lhs.Depth(), rhsDepth = rhs.Depth();if (lhsDepth <= rhsDepth)

return false;object toCheck = rhs;for(int i = rhsDepth; i < lhsDepth; --i)

toCheck = toCheck.Owner();return toCheck == lhs;

}public static bool isDescendentOf(this object lhs, object rhs) {

int lhsDepth = lhs.Depth(), rhsDepth = rhs.Depth();if (lhsDepth >= rhsDepth)

return false;object toCheck = lhs;for(int i = lhsDepth; i < rhsDepth; --i)

toCheck = toCheck.Owner();return toCheck == rhs;

}public static bool isDisjointFrom(this object lhs, object rhs) {

int lhsDepth = lhs.Depth(), rhsDepth = rhs.Depth();int topDepth = lhsDepth, bottomDepth = rhsDepth;object top = lhs, bottom = rhs;if (lhsDepth < rhsDepth) {

top = rhs; bottom = lhs;topDepth = rhsDepth; bottomDepth = lhsDepth;

}object toCheck = bottom;for(int i = bottom; i < top; --i)

toCheck = toCheck.Owner();return toCheck != top;

}public static bool isSameAs(this object lhs, object rhs) {

return lhs == rhs;}

Listing 6.30: The extension methods in the runtime ownership library used to test therelationships between arbitrary contexts.

6.5.2 Implementation

Having discussed how the compiler tests the sufficient conditions for parallelization, it

remains to discuss when and how the compiler exposes inherent parallelism. Determin-

ing when to explicitly expose inherent parallelism to improve performance is difficult.

I assume there would be some oracle that could be consulted which would tell the

compiler when parallelization will result in a net increase in application performance.

Unfortunately, such an oracle is currently beyond the current state-of-the-art, but is

an important open research question beyond the scope of this thesis.


Since no parallelization oracle exists, the current compiler defaults to maximum paral-

lelization. As a result, the compiler attempts to parallelize all foreach and enhanced

foreach loops in a program. The programmer can manually review the results of

this parallelization and selectively revert loops to sequential operation where required.

The compiler has analysis and code transformation methods to recognize and expose

task parallelism, but at present these are disabled by default due to the lack of a

parallelization oracle. The parallelization of a code block is performed by invoking the

Parallelize method on the block. The method returns an AST for the parallel version

of the original AST sub-tree rooted at the node on which it was run.

The compiler currently uses the Microsoft Task Parallel Library (TPL) to implement

the task and data parallelism patterns [80]. The TPL provides constructs for data and

task parallelism and is designed to make it easy to write parallel programs, but it does

not provide any validity checking.

6.5.2.1 Data Parallelism

I have developed sufficient conditions for two loop parallelism patterns: data parallelism

and pipelining. The TPL provides direct support for the implementation of the data

parallelism pattern in the form of the Parallel.ForEach method. This method accepts

an enumerable data source and a method which is the loop body and which accepts a

single element from the enumerable data source as a parameter.

When the sufficient conditions for the parallelization of a sequential data parallel loop

are met, the loop body is made the body of the method passed to the Parallel.ForEach

method with the loop’s collection as shown in Listing 6.31. It is also used as part of

the supplied implementations of the ParallelEnhancedLoop methods as shown in List-

ing 6.32.

// Sequential Loop

foreach(Element elem in collection){ // loop body }

// Parallel Loop

Parallel.ForEach(collection, (Element elem) =>{ // loop body };

Listing 6.31: foreach loop parallelization using the TPL’s Parallel.ForEach method.


An example of a ParallelEnhancedLoop implementation using the TPLParallel.ForEach

loop is shown in Listing 6.32.

public static void ParallelEnhancedLoop(this IEnhancedEnumerable collection, Body<object, object> body) {Parallel.ForEach(collection.GetIndices(), (object index) => {

object tempElem = collection[index];body(ref tempElem, index);collection[index] = tempElem;

});}

Listing 6.32: An example of how a parallel enhanced foreach loop would be imple-mented.

In cases when the compiler cannot statically determine if the sufficient conditions for

loop parallelization are met, the compiler emits a conditionally parallel implementation

of the loop. An example of such a conditional implementation is shown in Listing 6.33.

if (// context relationship test ) {// Parallel Loop

Parallel.ForEach(collection, (Element elem) => {// loop body

}); }else {// Sequential Loop

foreach(Element elem in collection){// loop body

}

Listing 6.33: Conditional foreach loop parallelization using the TPL’sParallel.ForEach method.

6.5.2.2 Pipelining

Section 2.4.2 described loop pipelining as a technique to stage the execution of loop

iterations to extract a limited amount of parallelism when full data parallelism is not

possible due to loop carried dependencies. There are a number of existing algorithms

for scheduling loop bodies for pipelined execution. My effect system can be used to

compute the data dependency information required for these pipelining algorithms.

Once the loop has been scheduled for pipelined execution there are a number of different

techniques for implementing the pipeline stages.

At present, the Zal compiler uses a simple algorithm to detect the dependencies between


statements in the loop body. This information is then used to break the statements

in the loop body into stages. Having identified the pipeline stages, the compiler then

implements the pipeline using a custom library, I have written, which treats the pipeline

as a sequence of concurrently executing stages. Each stage consumes input from a buffer

or stream source and produces output to an output buffer. The buffers between stages

serve to ensure that pipeline stages do not need to block waiting for subsequent stages

to consume their next input values. The loop stages are connected to form a sequence,

based on the dependencies between stages, if any. This is not the only way to detect

and schedule a loop for pipeline execution, but it serves to demonstrate the practicality

of the approach.

The pipeline library supplies two publically accessible types: Pipeline and

StageAction<InType, OutType>. The StageAction<InType, OutType> is a delegate

which represents the action performed by a pipeline stage. The Pipeline class provides

static methods used to construct pipelines. The full source code for the pipelining

library is available from [32]. Listing 6.34 shows an example of creating a pipeline which

processes an image through several transformations. Each of the pipeline addition

stages accepts a StageAction delegate whose input type is the same as the output type

of the previously constructed stage and whose output type can be anything desired. The

final Run method starts the pipeline and blocks waiting for it to finish execution. The

pipeline output is accessible through the IEnumerable returned by the Run method.

// Zal loop

foreach(ImageSlice|o| s in img) {noiseFilter.reduce(s);contrastFilter.balance(s);edgeDetector.findEdges(s);s.setImgType();

}// pipelined version of Zal loop

Pipeline.AddFirstStage((ImageSlice|o| s) => noiseFilter.reduce(s), img).AddStage((ImageSlice|o| s) => contrastFilter.balance(s)).AddStage((ImageSlice|o| s) => edgeDetector.findEdges(s)).AddStage((ImageSlice|o| s) => s.setImageType()).Run();

Listing 6.34: An example of using the pipelining library to create a pipeline; each stageonly modifies the ImageSlice and the representation of the filter or detector in thatstage, if any.


6.5.2.3 Task Parallelism

The TPL provides a number of constructs which simplify the implementation of task

parallelism. The compiler makes use of the Task<TResult> class which represents a

future — an asynchronously executed task which produces a result, the reading of

which is treated as a synchronization point. The compiler does not automatically

employ these transformations as it does not currently have a means of determining

when it is beneficial to do so; however, it does have all of the code implemented to

generate task parallel implementations when necessary. Figure 6.35 shows an example

of how task parallelism is implemented using a TPL Task as a future.

// Zal code

int x = computeValue();...int y = x;// parallelized version

Task<int> x = Task<int>.Factory.StartNew(() => computeValue());...int y = x.Result;

Listing 6.35: The implementation of task parallelism using a TPL Task.

6.6 Summary

In this chapter I have presented how I refactored and extended the GPC] compiler

written by our research group. I also discussed how the infrastructure for generic type

parameters was abstracted and used to implement support for the context parameters.

The key point was creating interfaces for formal and actual type parameters and adding

classes to hold multiple lists of type parameters. This infrastructure greatly simplified

the task of adding context parameters to the language.

In addition to adding context parameters to the C] language, I added effect decla-

rations to the language and an effect computation pass in the compiler. The effects

computed by the compiler can then be used to reason about inherent parallelism as

was discussed in Chapter 3. The sufficient conditions applied by the compiler are the

same as previously discussed. The major implementation detail discussed was how the

runtime context relationship tests are generated to facilitate conditional parallelization.


The Zal compiler is a source-to-source compiler which implements runtime ownership

tracking along with Zal semantics in C]. I have discussed in detail these different

transformations, including the need for special handling of statics. I have also presented

the design details of runtime libraries I have written to simplify the implementation of

runtime ownership tracking and context relationship testing in C].

Chapter 7

Validation

In the preceding chapters, I have presented a system for reasoning about side-effects

and inherent parallelism in modern imperative object-oriented languages. The proposed

effect system is abstract and composable which facilitates reasoning about side-effects

in third-party libraries and other code in separate compilation units. In Chapter 5

I provided an argument, based on existing work, for the safety and soundness of the

language I have proposed. I also sketched proofs that the sufficient conditions I have

proposed preserve sequential consistency. It remains to demonstrate that my proposed

system works on realistic programs.

When trying to parallelize a sequential program, there are three main steps:

1. find the inherent parallelism in the program,

2. decide which inherent parallelism is worth exploiting, and

3. choose an implementation technology to expose the selected parallelism.

All three of these steps are difficult and there are a number of important open research

questions relating to each of them. This thesis is focused on step 1, how to find

inherent parallelism; I do not claim any contributions to step 2 or step 3. My system

will, therefore, try to identify as much inherent parallelism as possible, even if it is

not worth exploiting. The key point is that my system can identify available inherent

parallelism automatically.

183

184 Chapter 7. Validation

The overall goal of this chapter is to demonstrate how well my proposed system works.

I aim to demonstrate, through the application of my system to realistic examples, that

1. it finds the major sources of inherent parallelism in the examples,

2. the amount of program annotation required, in terms of lines of code and effort,

is reasonable, and

3. the runtime and memory overheads of the dynamic system are reasonable in

relation to the benefit the system provides.

When working with examples of parallelization, there is always a tendency to focus on

achieving an overall performance improvement for the system in question. While I do

present performance numbers to quantify overheads, speed-up is not a direct measure

of the success of my approach because, as previously stated, I am contributing only to

one of the three steps of the parallelization process.

The remainder of this chapter presents results obtained from four worked examples

which are representative of a broad range of programs employing a variety of different

types of parallelism. Rather than presenting each example in isolation, I will present

the examples collectively diving the discussion up into the different steps involved in

performing the validation. I begin by introducing the examples and then proceed

to: annotate the example programs to produce a Zal program, compile the annotated

programs, and, finally, determine if the major sources of parallelism have been identified

and measure the overheads of the runtime system.

7.1 Test Platform

The system used to generate the results I present in this chapter is an Intel Core i7

Q720M quad-core CPU running at 1.6GHz with 4GB of RAM and 2 320GB 7200RPM

SATAII hard-disks in a non-RAID configuration. The Core i7 processor features Intel’s

Hyper-Threading Technology and so presents 8 logical processing units to the oper-

ating system while having only 4 physical cores. The system runs Windows 7 64-bit

Professional Edition with the release versions of the .NET Framework 4 and Visual

Studio 2010.

Chapter 7. Validation 185

7.2 The Examples

This chapter focuses on four specific examples: a ray tracer, a calculator, an example of

the spectral methods parallelism dwarf, and a bank transaction processing system. In

this section, I introduce each of these examples in detail and provide code snippets of the

most interesting and relevant parts (from an ownerships and parallelism perspective)

of the examples. The complete source for these sample applications is available from

the MQUTeR website along with the Zal compiler [32].

Some readers may observe that these examples are traditionally regarded as programs

written by specialist programmers. While this is true, these programs are representative

of much broader classes of programs which are written by non-specialist programmers.

These examples, therefore, serve the purpose of demonstrating the operation of the

system when applied to real problem types.

7.2.1 Ray Tracer

The first example I have chosen is a ray tracing application taken from Microsoft’s

“Samples for Parallel Programming with the .NET Framework 4” [82]. This ray tracer

renders a simple animated scene of a ball bouncing up and down as shown in Figure 7.1.

Note that the scene includes reflective surfaces which increase the computational com-

plexity of the rendering.

While the ray tracer is embarrassingly parallel by design, discovering that parallelism in

a sequential implementation is not trivial due to the complexity of the methods needed

to perform the rendering. In this example, the main source of parallelism is the loop

which traces a ray of light for each pixel in the rendered scene. The Microsoft supplied

implementation has two versions of this rendering loop. The first uses two nested

loops to traverse the screen pixels sequentially computing the color for each pixel in

the resulting scene as shown in Listing 7.1. The second version uses the parallel for

loop from the .NET Framework 4 to parallelize the outer for loop from the sequential

implementation.


Figure 7.1: The scene rendered by the ray tracer from Microsoft’s Samples for ParallelComputing with the .NET Framework 4; note the reflective surfaces which increase therendering complexity.

internal void Render(Scene scene, int[] scr) {Camera camera = scene.Camera;for (int y = 0; y < screenHeight; y++) {

int stride = y * screenWidth;for (int x = 0; x < screenWidth; x++) {

scr[x + stride] = TraceRay(new Ray(camera.Pos,GetPoint(x, y, camera)), scene, 0).ToInt32();

}}

}

Listing 7.1: The original C] Render method with its doubly-nested for loop.

7.2.2 Calculator

Tree traversal algorithms are found in a number of areas of computer science including

search, compilers, and databases. Many tree traversal algorithms, when written se-

quentially, contain at least some inherent parallelism and, in many cases, this inherent

parallelism takes the form of task parallelism. Because of the prevalence and utility

of tree data structures and the algorithms for their traversal, I have chosen a simple

calculator application to demonstrate the detection and exploitation of task parallelism

by my system.

The example calculator application reads in simple mathematical expressions written

using prefix notation and evaluates them. This is done by reading in an expression as


a string, scanning the string to produce tokens, parsing the token stream to produce

an Abstract Syntax Tree (AST), and then using the AST to compute the value of the

expression. The computation of the value of the expression can be achieved by using a

post-order traversal of the AST.

The major source of inherent parallelism in this example is found in the traversal of

the calculator’s AST nodes with two or more children; the traversal of the children can

be carried out in parallel. An example of such a node in the calculator example is the

BinaryOperator AST node which represents binary arithmetic operators such as +

and −. Listing 7.2 shows the implementation of the abstract AST Node class as well

as the BinaryOperator node.

abstract class Node {public abstract int Compute();

}

class BinaryOperation : Node {char opType;Node left, right;

public BinaryOperation(char opType, CalculatorToken[] left,CalculatorToken[] right) {this.opType = opType;this.left = CalculatorParser.Parse(left);this.right = CalculatorParser.Parse(right);

}

public override int Compute() {int left = this.left.Compute();int right = this.right.Compute();

switch (opType) {case ’+’: return left + right;case ’-’: return left - right;case ’*’: return left * right;case ’/’: return left / right;case ’%’: return left % right;

}throw new InvalidOperationException();

}}

Listing 7.2: The C] implementations of the calculator AST Node class andBinaryOperator class.


7.2.3 Bank Transaction System

There are often cases where full loop parallelism is not possible due to data depen-

dencies. It may be possible, in these cases, to extract some parallelism through the

use of pipelining to stagger the execution of the loop body. I wrote a simplified bank

transaction processing application as an example of this style of parallelism.

The simplified bank in this transaction system consists of a vault full of accounts, an

authentication system, and transactions. The bank has a method which accepts a list

of transactions which transfer money between two accounts. This method traverses

the list of transactions with a foreach loop applying the changes to the accounts in

the vault after validating the transaction. Listing 7.3 shows the Transaction structure

and the method which processes the transactions.

7.2.4 Spectral Methods

Fine-grained parallelism is another important form of parallelism and while my system

has not been specifically designed to detect and exploit traditional parallelism, there

are still a number of cases where it can do so. To demonstrate this, I now present an

example of an application which performs two 1-Dimensional Fast Fourier Transforms

(FFTs) on a matrix of values. This application is one of the examples provided as part

of the Parallelism Dwarfs project [120].

In this example, the main source of parallelism comes from the application’s FFT

computation loop which is shown in Listing 7.4. The Parallel Dwarfs project supplies

a number of different implementations of the same algorithm including a sequential

C] version and a C] version written using the .NET Framework 4’s Task Parallel Li-

brary [80].


struct Transaction {AccountInfo src, dest;int amount;bool authenticated = false;bool validated = true;

public Transaction(string srcAccountCode, string srcAccountPIN,string destAccountCode, int amount) {

src = new AccountInfo(srcAccountCode, srcAccountPIN);dest = new AccountInfo(destAccountCode, null);this.amount = amount;

}

public void Authenticate(Authenticator auth) {authenticated = auth.Authenticate(src.AccountCode,

src.AccountPIN);}public void ValidateSource(Vault vault) {

validated = validated && Account.VerifyAccountCode(src.AccountCode);}public void ValidateDespt(Vault vault) {

validated = validated && Account.VerifyAccountCode(dest.AccountCode);}public void Apply(Vault vault) {

if (authenticated && validated)vault.Transfer(src.AccountCode, dest.AccountCode, amount);

}}

class Bank {...public void Apply(List<Transaction> transactions) {

foreach (Transaction transaction in transactions) {transaction.Authenticate(auth);transaction.ValidateSource(vault);transaction.ValidateDespt(vault);transaction.Apply(vault);

}}

}

Listing 7.3: The original C] implementation of the bank transaction system’sTransaction and the Bank’s transaction processing method.


struct Complex { public double real, imag; ... }class Solver {

public int length;Complex[][] complexArray;public void Solve() {

// Transform the rows

foreach (Complex[] row in complexArray)row.FFT();

complexArray.transpose();

// Transform the columns

foreach (Complex[] row in complexArray)row.FFT();

complexArray.transpose();}

}public static class FFTHelpers {

public static void FFT(this Complex[] complexLine) {Complex[] W = new Complex[length];for (int i = 0; i < complexLine.Length; ++i) {

W[i].real = Math.Cos(-((2 * i * Math.PI) / length));W[i].imag = Math.Sin(-((2 * i * Math.PI) / length));...

}...

}}

Listing 7.4: A fragment of the sequential C] Spectral Method’s key data structures andcomputational methods.


7.3 The Annotation Process

Having selected the C] applications to use as examples, the next step was to annotate

these programs with context parameters and effect declarations. There are a number

of possible approaches to annotating a program with ownership information. In this

section, I outline the heuristic process I used which is iterative in nature and is a process

of refinement. At present this is a manual process undertaken by the programmer. This

process could be as much work as manually parallelizing the program. However, this

process has two advantages: (1) it is a more mechanical process that requires less

specialized parallelism training and (2) this process could be easily adapted for use in a

compiler or integrated development environment (IDE) to relieve the programmer of at

least some of the annotation overhead. I use the code fragment shown in Listing 7.5 as

a running example throughout the following discussion to demonstrate the algorithm’s

operation.

public class Result {private String message;private Object value;public Result()

{ message = "Result Message"; }public void SetValue(Object value)

{ this.value = value; }}

public class ResultWrapper {public Result res;

}

Listing 7.5: The original sample code used as a running example to show the operationof my proposed ownership annotation algorithm heuristic.

The initial step for annotating a program is to find all of the classes which do not

reference other classes in the project. These classes gain a single context parameter

which is the owner of the class and the data it contains. In the running example

introduced in Listing 7.5, the Result class is an example of a class which does not

reference other classes in the project. It is initially annotated with a single owner

context parameter as shown in Listing 7.6.

A data flow analysis is performed for each field which holds an object reference to

determine if it holds a value generated by the class or a value passed in. If the field

holds only values generated by the class, its type is annotated with the current class’s


public class Result[o] {private String message;private Object value;public Result()

{ message = "Result Message"; }public void SetValue(Object value)



}

Listing 7.6: The first step of annotating the Result class, adding the owner.

this context as an owner. Otherwise, an additional context parameter is added to the

class declaration to be used as the field’s owning context. In the running example, the

value field of the Result class is annotated as being owned by an additional context

parameter since its value can be generated outside the class. The message field is

assigned to only in the Result constructor and so is owned by this (see Listing 7.7).

public class Result[o,v] {private String|this| message;private Object|v| value;public Result()

{ message = "Result Message"; }public void SetValue(Object|v| value)



}

Listing 7.7: The completion of the Result class’s annotation.

Once all of these “low level” classes have been annotated with owners, annotation

proceeds with classes which reference only classes which have already been annotated.

Continuing the example, the ResultWrapper class is annotated with a single owner as

shown in Listing 7.8.

Next, the ResultWrapper fields are analyzed. The res field is publically accessible and

so its value can be generated outside the current class and so context parameters are

added to the ResultWrapper class as shown in Listing 7.9.

Once the preceding naıve annotation algorithm has been run, the annotated program

has a large number of context parameters. This large number of context parameters may





public class ResultWrapper[o] {public Result res;

}

Listing 7.8: The first step of annotating the ResultWrapper class.




public class ResultWrapper[o,r,s] {public Result|r,s| res;

}

Listing 7.9: The final result of annotating the code shown in Listing 7.5.

be unwieldy and may not capture data relationships which the programmer knows to be

true. To reduce the number of context parameters and capture data relationships known

from the application design, each of the annotated classes is revisited and ownership

contexts merged where possible. For example, the o and v on the Result class in

Listing 7.9 might be merged if it was known that the value and Result objects are from

the same part of the program’s representation. The merging of context parameters is

a trade-off between flexibility and simplicity and so the choices made may need to be

revisited during the development of a system as it evolves.

Once the initial annotation addition and rationalization passes have been completed,

the program is compiled. The compiler output can be examined to determine where

parallelism has been found. If areas of computational intensity have not been paral-

lelized, the reasons for the parallelization failure can be determined. At present this

information is available only by manually debugging the compilation of the program,

but it would be possible to expose this information as part of the compiler’s output or

through a customized IDE. This information can assist the programmer in determining


where code needs to be refactored or where context annotations have been rationalized

excessively causing the loss of too much information.

As discussed in Section 3.4.3, the relationships between the contexts read and written

by a code fragment must be known before it can be safely parallelized. In some cases,

the compiler may not be able to statically determine the relationship between context

parameters. In these cases it emits a runtime context relationship check. These checks

can be performed efficiently, as previously discussed in Section 3.4.4. However, if the

programmer knows the relationship should hold, a context constraint can be added to

capture this fact. These constraints reduce the flexibility of the code, but eliminate the

need for runtime context relationship checks. Whether context constraints should be

used is for the programmer to decide.

It is possible that the many of the annotations required by my system could be removed

or further simplified in future through the implementation of an ownership inference

system. The initial annotation process is highly procedural and so would be quite

amenable to automation. Note that the burden of program annotation is not unique

to my system, but is common to all ownership based systems and there is a significant

amount of research aimed at reducing this burden.

7.4 The Results of Annotation

I applied the annotation process described in the previous section to the four examples

introduced in Section 7.2. In this section I begin by presenting a summary of statistics

related to the annotation process. This is followed by a discussion of the annotation of

each of the examples individually.

When adding additional syntax to a programming language, the burden imposed on

the programmer needs to be minimized and the benefit derived from the additional

information maximized. To measure the syntactic overhead of the ownership and effect

annotations required I have calculated two statistics. The first, is a measure of the

total number of lines of code that had to be modified with context parameters or effect

declarations. Table 7.1 shows the total logical lines of code per example as well as the

number and percentage of lines modified.


Logical Source Lines of Code

Example Total Modified Percentage (%)Ray Tracer 249 41 16.7%Calculator 161 43 26.7%Bank Transaction System 124 28 22.5%Spectral Methods 81 8 9.9%Total 615 120 19.5%

Table 7.1: Table showing the number of logical lines of source code modified for eachof the examples during the annotation process.

The second, is a measure of the total number of method definitions that had to be

modified with either context parameters, effect declarations, or both. Compared to the

context annotations, the effect annotations required on virtual methods can be more

onerous for the programmer to compute. Table 7.2 shows the number of method defini-

tions in each example as well as the number and percentage of those definitions which

had to be annotated. On average, roughly a quarter of method declarations required

some form of annotation, but the complexity of these annotations varied greatly. Only

methods declared to be virtual can be overridden and so require programmer effect

declarations; the side-effects of all other methods can be computed from the method

body. This means that only virtual methods and those requiring context parameters

require annotation.

Method Definitions

Example Total Modified Percentage (%)Ray Tracer 52 8 15.3%Calculator 16 11 68.8%Bank Transaction System 20 6 30%Spectral Methods 10 2 20%Total 98 27 27.6%

Table 7.2: Table showing the number method definitions modified for each of theexamples during the annotation process.

These statistics show that the annotation overhead of my proposed system is reason-

able since my system is a prototype with a number of avenues for further simplification.

Having presented the statistics, I now discuss the annotation of each example individ-

ually. I focus on the code that is the major source of parallelism in the examples as

previously discussed in Section 7.2.


7.4.1 Ray Tracer

The entire ray tracing application was annotated with ownership information. The

parallelism found by the Zal compiler using my proposed sufficient conditions coincided

with the known sources of parallels in the application.

The main source of parallelism is the Render method. This method was annotated

with two context parameters to represent the owners of the scene and the array of

pixels. In addition to these annotations, I rewrote the doubly-nested for loop as a

single enhanced foreach loop to make it more clear that the loop iterates over the

elements of the scr array. Following the initial compilation of the method, the Render

method’s loop was parallelized conditional on the array’s owner, t, being disjoint from

this and s. After exploring the design of the program, I decided to add a static context

constraint to the Render method capturing this context relationship. Listing 7.1 shows

the end result of the annotation and loop rewriting operations.

internal void Render[s,t](Scene|s| Scene, int[]|t| scr)reads<this,s,t> writes<t> where t # this,s {Camera|s| camera = scene.Camera;foreach(ref int pixel at int Index in scr) {

pixel = TraceRay|s|(new Ray(camera.Pos,GetPoint|s|(Index % screenWidth, Index / screenWidth, camera)),scene, 0).ToInt32();

}}

Listing 7.10: The Zal implementation of the original Render method shown in List-ing 7.1; note the loop has been rewritten as an enhanced foreach loop.

7.4.2 Calculator

The annotation of the Node and BinaryOperation classes in the calculator example

was undertaken following the algorithm described in Section 7.3. I added an owner

to the Node class and to the BinaryOperator classes. It was also necessary to add an

additional context parameter to the BinaryOperator class to represent the owner of the

tokens used to build the object. After the initial compilation of the BinaryOperator,

I examined the calls to compute the values of the left and right expressions as part

of the Compute method. The compiler indicated that the two calls were not able to be

run in parallel because they shared the same owner. I, therefore, decided to introduce


two subcontexts to hold the subexpressions and allow task parallelism to be exploited.

Listing 7.11 shows the final annotated version of the Compute method.

abstract class Node[owner] {public abstract int Compute() reads <this> writes <>;

}

class BinaryOperation[owner,tokOwner] : Node|owner| {subcontext sub left, sub right;char opType;Node|sub left| left;Node|sub right| right;

public BinaryOperation(char opType, CalculatorToken[]|tokOwner| left,CalculatorToken[]|tokOwner| right)reads <tokOwner> writes <tokOwner> {this.opType = opType;this.left = CalculatorParser.Parse|sub left,tokOwner|(left);this.right = CalculatorParser.Parse|sub right,tokOwner|(right);

}

public override int Compute() reads <this> writes <> {int left = this.left.Compute();int right = this.right.Compute();

switch (opType) {case ’+’: return left + right;case ’-’: return left - right;case ’*’: return left * right;case ’/’: return left / right;case ’%’: return left % right;


}}

Listing 7.11: The Zal implementation of the calculator AST Node class andBinaryOperator class.


In the Bank Transaction System, I focused on the Bank’s Apply method because it is

the main source of parallelism in the application. I annotated this method by adding

a context parameter to the method, to represent the owner of the transaction list

supplied as a parameter. After my initial compilation of the method, the foreach

loop was implemented sequentially. Additional development revealed the need to have

separate owners of the Vault and Authenticator references and so I added the sub v

and sub a subcontexts. The overall result of these annotations is shown in Listing 7.12.


class Bank[owner] {subcontext sub v, sub a;...public void Apply[listOwner](List<Transaction>|listOwner| transactions)

reads<this,listOwner> writes<this> {foreach (Transaction transaction in transactions) {

transaction.Authenticate|sub a|(auth);transaction.ValidateSource|sub v|(vault);transaction.ValidateDespt|sub v|(vault);transaction.Apply|sub v|(vault);

}}

}

Listing 7.12: The Zal implementation of the Bank’s transaction processing method.


The main source of parallelism in the spectral methods example is the loop which applies

a Fast Fourier Transform to each of the rows in the complexArray. Listing 7.13 shows

a snippet of the example annotated with context parameters. The two dimensional

complexArray has an owner for each of the two dimensions; one owner for the array

containing arrays and one for the arrays containing values. The data contained in the

array is part of the solver representation, but to separate the array’s structure from the

array’s data, I needed to add two subcontexts to the Solver as shown in the listing.

7.4.5 Summary

Overall, approximately a quarter of the sample programs required some form of an-

notation with the system I currently propose. This burden could be further reduced

by the addition of even a basic ownership inference system and more helpful defaults.

Given that much of the information contained in these annotations is already part of

the overall program design, I have shown that the safe automatic parallelization of

sequential programs is possible using my system; it remains to refine the techniques I

have proposed.


struct Complex { public double real, imag; ... }class Solver[owner] {

subcontext rowContext, colContext;public int length;Complex[]|rowContext|[]|colContext| complexArray;public void Solve() reads <this> writes <this> {


foreach (Complex[]|rowContext| row in complexArray)row.FFT|colContext|();



foreach (Complex[]|rowContext| row in complexArray)row.FFT|colContext|();



public static void FFT[o](this Complex[]|o| complexLine)reads <this,o> writes <this>{Complex[]|o| W = new Complex[length]|o|;for (int i = 0; i < complexLine.Length; ++i) {

W[i].real = Math.Cos(-((2 * i * Math.PI) / length));W[i].imag = Math.Sin(-((2 * i * Math.PI) / length));...

}...

}}

Listing 7.13: The annotated version of the Spectral Methods example.

7.5 The Results of Compilation

The Zal compiler emits C] code which adds runtime ownership tracking to the generated

classes and which exposes exploitable inherent parallelism using Microsoft’s Task Par-

allel Library (TPL) [80]. In this section I present the results of compiling the programs

annotated in the previous section. There are two main goals for this presentation: (1)

the major sources of inherent parallelism in the examples are explicitly exposed in the

compiler implementation and (2) provide the reader with a feel for how Zal programs

are implemented using the TPL in C] (see Chapter 6).

7.5.1 Ray Tracer

The annotated ray tracer program shown in Listing 7.1 was compiled to C] using the

Zal compiler. When the Zal compiler generates C] source code, it stores formal context


parameters and effect information in custom C] attributes. The C] compiler, in turn,

stores these attributes in the executable code it produces. The reason for storing this

information is so that if a Zal program later links against the program, the program’s

ownership and effect information is available. This facilitates both type checking and

effect computation.

Listing 7.14 shows the C] generated from the Zal implementation of the Render method.

Notice that the compiler has transformed the enhanced foreach loop into a parallel

version implemented as an extension method as discussed in 6.4.2.

[ReadEffect("this", "s", "t"), WriteEffect("t"),FormalContextParameters("s", "t")]internal void Render(Scene Scene, int[] scr, IOwnership Context s,

IOwnership Context t) {Camera|s| camera = scene.Camera;scr.ParallelLoop((ref int pixel, int Index) =>

{pixel = TraceRay(new Ray(camera.Pos, GetPoint(

Index % screenWidth, Index / screenWidth,camera, Context s)),scene, 0, Context s).ToInt32();

});}

Listing 7.14: The C] implementation of the Zal Render method shown in Listing 7.1.

7.5.2 Calculator

Listing 7.15 shows the output of the Zal source-to-source compiler. Note the task par-

allelism that has been explicitly exposed by a source-to-source transformation applied

by the compiler to the computation of the values of the left and right subexpressions

of the BinaryOperation. Also note the subcontext fields in the emitted class.


The main source of parallelism in the bank transaction system can be found in the

processing of transactions. The loop which processes transactions is not amenable to

full parallelization due to writes to shared mutable state; however, the loop is suitable

for pipelining. Listing 7.16 shows how the loop of interest is parallelized by the Zal

compiler. Notice that the loop has been transformed into a multi-stage pipeline.


[FormalContextParameters("owner","tokOwner")]class BinaryOperation : Node,IOwnership {

private int depth;private object owner;

[SubcontextField]private IOwnership Context sub right;[SubcontextField]private IOwnership Context sub left;char opType;Node left, right;[ReadEffect("this"),WriteEffect()]public override int Compute() {

int left = 0;int right = 0;Task[] tasks = new Task[]{

Task.Factory.StartNew(() => {left = this.left.Compute(); }),Task.Factory.StartNew(() => {right = this.right.Compute(); })

};Task.WaitAll(tasks);switch (opType) {

case ’+’: return left + right;case ’-’: return left - right;case ’*’: return left * right;case ’/’: return left / right;case ’%’: return left % right;


}...

}

Listing 7.15: The implementation of part of the Zal calculator example in C]; note thetask parallelism in the Compute method.


Finally, I conclude by presenting the results of compiling the spectral methods example.

The compiler successfully detects that the two foreach loops applying the FFT row-

wise to the matrix can be executed in parallel and transforms the loops appropriately.

Note that the two dimensions of the complexArray in the Solver method are given

different owners; without this the sufficient conditions for parallelism could not be met

and the compiler conservatively concludes that the loops must run sequentially. The

compiler output for the fragment of the example previously presented in Listings 7.4

and 7.13 is shown in Listing 7.17.


[FormalContextParameters("owner")]class Bank : IOwnership{

...[ReadEffect("this", "listOwner"), WriteEffect("this"),FormalContextParameters("listOwner")]public void Apply(List<Transaction> transactions,

IOwnership Context listOwner) {Parallel.Pipeline.AddFirstStage

((Transaction transaction) => {transaction.Authenticate(auth, Context sub a);return transaction;

}, transactions).AddStage((Transaction transaction) => {

transaction.ValidateSource(vault, Context sub v);return transaction;

}).AddStage((Transaction transaction) => {

transaction.ValidateDest(vault, Context sub v);return transaction;

}).AddStage((Transaction transaction) => {

transaction.Apply(vault, Context sub v);return transaction;

}).Run<Transaction>();}

}

Listing 7.16: The implementation of the Zal transaction processing method in C]; notethe pipelined foreach loop.

7.6 Performance

Having selected examples representative of a broad spectrum of different inherent par-

allelism patterns, I have demonstrated that my proposed system can detect and exploit

the major sources of parallelism contained in these examples. It now remains to mea-

sure the runtime and memory overheads involved in exploiting the parallelism detected

which is the focus of this section.

It is important to note that the runtime ownership tracking systems have not been

fully optimized. The most obvious opportunities for performance improvement have

been taken, but there are still a number of areas where additional performance may

yet be extracted. It is also important to remember that runtime ownership tracking

and context relationship testing is not required. In the absence of a runtime ownership

tracking system, any runtime relationship tests made by programs will fail which will

result in the program relying solely on static reasoning.


struct Complex { public double real, imag; ... }[FormalContextParameters("owner")]class Solver {

public int length;Complex[][] complexArray;private IOwnership Context colContext;private IOwnership Context rowContext;public Solver() : base() {

Context colContext = new SubContext(this);Context rowContext = new SubContext(this);

}[ReadEffects("this"), WriteEffects("this")]public void Solve() {


Parallel.ForEach(complexArray, ()=> row.FFT( Context colContext));



Parallel.ForEach(complexArray, ()=> row.FFT( Context colContext));



[ReadEffects("this", "o"), WriteEffects("this")]public static void FFT(this Complex[] complexLine,

IOwnership Context o) {Complex[] W = new Complex[length];Context o.AddChild(W);

for (int i = 0; i < complexLine.Length; ++i) {W[i].real = Math.Cos(-((2 * i * Math.PI) / length));W[i].imag = Math.Sin(-((2 * i * Math.PI) / length));...

}...

}}

Listing 7.17: The compiler output for the Zal implementation of the Spectral Methodsexample.

As has been stated earlier in this chapter, the goal of these measurements is not to

measure an overall performance improvement from all of these examples since not all

of the detected parallelism is necessarily worth exploiting. The goal of this thesis is

to identify parallelism; determining when to exploit it remains an important open but

separate research problem.


7.6.1 Runtime Overhead

In this section, I present measurements of the overhead of my proposed runtime own-

ership tracking system which can be used to supplement the statically known context

relationships. I have compiled all four annotated examples using the Zal compiler to

produce two different implementations: one using the simple parent pointer algorithm

and one using Dijkstra Views. Figures 7.2, 7.3, 7.4, and 7.5 show the speed up graphs

for the ray tracer, calculator, bank transaction system, and spectral methods examples

respectively. Each of these speed up graphs shows an ideal speed up line which shows

how a perfectly parallel algorithm with no sequential components. These graphs show

the logical number of cores since the testing was performed on a hyper-threaded Core

i7 system and in the ideal case there should be no observable difference between the

number of logical and physical computational cores. The graphs also feature a per-

formance measurement of a version of each example that has been hand parallelized

using Microsoft’s TPL technology. This provides a reference point for determining the

overhead introduced by the runtime ownership tracking systems I have implemented.

The bank transaction system and calculator examples do not benefit from paralleliza-

tion overall, but the key point is that the performance of the ownership tracking system

has only a very small impact on the program execution time in these cases. To make

the overheads of the different systems more clear, Figures 7.6, 7.7, 7.8, and 7.9 show the

overheads as percentages of the hand parallelized execution time. Note that the differ-

ence is due solely to the overhead of the runtime ownership tracking system because

both the hand parallelized and automatically parallelized versions of the applications

used the same Task Parallel Library constructs to effect the parallelization.

Overall, the worst overhead measured for the pointer-based runtime ownership system

was 10% with many cases where the overhead of the pointer-based system was on

the same order of size as the noise in the sample times which were obtained from an

arithmetic mean of 30 repetitions of the measurement after the 3 fastest and 3 slowest

measurements were removed. The runtime overhead of the Dijkstra Views based system

was approximately 20% in the worst case, though it did perform very well when applied

to the calculator example. From the results, the pointer based ownership tracking

system was the better of the two tracking systems for all of the examples studied. The


Dijkstra Views system would likely be better in cases where the ownership trees are

tall and the number of object relationship tests high.

7.6.2 Memory Overhead

Having discussed the runtime overhead of my proposed ownership tracking system,

it remains to explore the memory overhead incurred by these techniques. The ray

tracer and spectral methods examples are computationally bound problems which do

not create a significant number of objects and so are not used as part of this section

focused on measuring the memory overhead of each of the runtime ownership tracking

systems.

Figures 7.10 and 7.11 show the relative memory overheads for the two different own-

ership tracking systems in the calculator and bank transaction systems respectively.

These overhead measurements are made relative to the hand coded version using the

TPL. In the calculator example, the objects allocated in the program form a binary

tree. As the number of nodes increases so does the height of the binary tree and the

ownership hierarchy. The increase in the ownership tree’s height causes a proportional

increase in the amount of memory consumed by the Dijkstra View’s ownership track-

ing implementation as expected. Overall the memory overhead of the pointer based

system was approximately 15% on average with the Dijkstra Views implementation

adding approximately a 30% memory overhead. These numbers could be further re-

duced with additional optimization including CLR and JIT support for the runtime

ownership tracking system. The pointer based system’s memory overhead could be

reduced by not caching an object’s depth in the ownership tree in the object, but this

would come at the cost of reduced runtime performance.


7.7 Summary

In this chapter I have presented several worked examples covering a variety of different

types of exploitable inherent parallelism. I have demonstrated that the major sources

of parallelism in these examples can be detected and exposed automatically when the

program is appropriately annotated with my type and effect system. On average, ap-

proximately 25% of the sample program source code required annotation with context

parameters or effect declarations. The small examples, the calculator and bank transac-

tion systems, had a higher percentage of lines that required annotation than the larger

more complex examples. The higher annotation percentages for the smaller examples

probably represent the worst case annotation overhead since larger methods amortize

the annotation cost across more lines of code.

With the use of my proposed system, all of the major sources of parallelism were found

which produced solutions which scaled in the same way as the hand parallelized versions

did. The runtime overhead of ownership tracking was between 10% and 20% in the

worst case with memory overhead in the region of 15% to 30% for the pointer chasing

ownership tracking system. Overall in this chapter, I have successfully implemented

a proof of concept system which has demonstrated that my proposed system is not

prohibitively expensive.


Figure 7.2: Speed-up graph for the ray tracer example.

Figure 7.3: Speed-up graph for the calculator example.


Figure 7.4: Speed-up graph for the bank transaction processing system.

Figure 7.5: Speed-up graph for the spectral methods example.


Figure 7.6: Graph showing the overhead of the two possible runtime ownership trackingsystems when applied to the ray tracer example.

Figure 7.7: Graph showing the speedup in the calculator example when the runtimeownership tracking systems are enabled.


Figure 7.8: Graph showing the overhead of the two possible runtime ownership trackingsystems when applied to the simplified bank transaction system.

Figure 7.9: Graph showing the overhead of the two possible runtime ownership trackingsystems when applied to the spectral methods example.


Figure 7.10: Graph showing the memory overhead of the pointer and Dijkstra Viewsbased runtime ownership tracking systems. The O(n) memory usage of the DijkstraViews implementation can be clearly seen as the number of nodes increases.

Figure 7.11: Graph showing the memory overhead of the pointer and Dijkstra Viewsbased runtime ownership tracking systems. Note that the height of the ownership treedoes not increase with the data size in this example and so the Dijkstra Views memoryconsumption grows proportional to the number of transactions.

Chapter 8

Comparison with Related Work

Reasoning about data dependencies, side-effects of evaluation and the presence of in-

herent parallelism have long been active areas of research in the field of computer

science. In this thesis I propose a system for reasoning about side-effects of evaluation

in an abstract and composable manner. I apply this system to the problem of finding

inherent parallelism in sequential programs. There is, therefore, a considerable body

of existing literature directly related to this work. In this chapter I explain how my

system compares and contrast with other related systems documented in the literature

and how my system addresses issues not addressed by others. Traditionally, related

work chapters are presented early in a thesis, but I chose to defer this discussion so

that my comparisons could involve the full technical details of the systems concerned

which would not have been possible at the start of this thesis.

Current approaches to exploiting parallelism in programs can be classified into two

broad groups: speculative approaches and non-speculative approaches. The speculative

approaches execute portions of the program ahead of time based on predicted control

flow information. During speculative execution, these systems monitor the changes in

the program’s state. If there is a conflict between the main program and the speculative

execution, then the system reverts the changes made by the speculative execution to

return the program to a consistent state. Even with these speculative approaches,

the programmer must still define which operations need to be run together and which

can be broken apart. Further, the programmer may need to supply some or all of

the rollback operation in addition to the program itself. The current major examplar

213

214 Chapter 8. Comparison with Related Work

of this style of parllelization is Software Transactional Memory (STM). Speculative

approaches have the benefit of not requiring programmers to modify their programs,

but suffer from high overheads when conflicts are frequent or when few processors are

available to speculatively execute program fragments.

By comparison, the non-speculative approaches focus on determining the dependencies

present in a program. This information is used to avoid parallelizing code when doing

so could cause dependencies to be violated or adding synchronization to prevent the

dependency violation. Both of these approaches have their pros and cons. In this thesis,

my work has focused on a priori reasoning about the presence of inherent parallelism

and it is most closely related to the non-speculative approaches that are the focus of

this chapter.

The existing non-speculative work most directly related to this thesis can be broadly

classified into six major areas: type systems, logics, traditional data dependency and

may-alias analyses, parallel programming languages and APIs, alternative concurrency

abstractions, and object-oriented paradigm considerations. None of these systems com-

bine abstraction and composition, with parallelization and correctness checking to pro-

duce a framework which helps both programmers and automated tools to reason about

inherent parallelism. This thesis draws on ideas from all of these areas. In this chapter,

I aim to discuss the literature related to each of these different areas with a view to

placing my work into context with existing work in the literature and arguing for the

uniqueness of its contributions.

8.1 Background to Type Systems and Data Flow Analysis

Type systems are one of the key tools programmers use to reason about the behavior of

programs. The goal of a type system is to require programs to respect a set of predefined

invariants. A program is said to be well-typed when it respects the invariants required

by the language it is written in. Type systems come in a number of different styles

and strengths, from those which provide only limited invariant enforcement at runtime

to those which provided rigid and complex static invariant enforcement; I refer to

weak dynamically typed languages and strong statically typed languages respectively.

Chapter 8. Comparison with Related Work 215

Actual type systems form a spectrum between these extremes. The stronger the type

system, the more invariants it can enforce and the more complex those invariants can

be. Unfortunately, this extra information about the behavior of programs, which can be

exploited by both programmers and automated tools, does not come for free. The more

rigid and complex the type system, the greater the effort required by the programmer

to annotate constructs with type information as well as create constructs designed just

to keep the type checker happy.

Programmers have long debated the merits of strong typing versus weak typing and the

merits of static typing versus dynamic typing. Opinions on the best choice tend to vary

with the dominant type systems of the day. When strong statically typed languages

are dominant, many programmers find the limited amount of type information required

when writing a program in a weakly typed dynamic language attractive. The rise in

popularity of dynamic languages such as Ruby, Python, and JavaScript over the last 10

years is evidence of this increased support [119]. As people switch to using weaker type

systems, they become aware that these systems cannot provide the same guarantees as

strong static type systems and, over time, interest flows back to strong static typing

for this reason. The efforts to statically check programs written in languages like

Ruby [43] and JavaScript [118, 62] indicate a need for more checking and validation

than currently provided by these dynamic languages. There is, therefore, a cost benefit

analysis which generally takes place when designing a language and its associated type

system [107]. A good language allows important properties to be reasoned about with

minimal annotation effort (both syntactic and mental) [107].

Type systems have not, traditionally, been used to reason about data dependencies and

parallelism. As was just discussed, type systems aim to enforce invariants across entire

programs. Data-flow analysis, by contrast, can be thought of as a set of techniques

used to discover invariants in a program. Program invariants can be used for a number

of purposes including detecting inherent parallelism and program optimization oppor-

tunities. Laud, Uustalu and Vene have shown that type systems are equivalent to data

flow analyses for imperative programming languages [71]. Some invariants may be most

easily enforced using a type system while others may be more amenable to discovery

using data-flow analysis. Often, the two techniques can be used to complement each


other. For example, when trying to determine the range of possible values a variable

may have, the type system may enforce the range of possible values is limited to a

particular domain, such as the natural numbers, while data-flow analysis may provide

a more precise range of values, such as natural numbers between 0 and 100.

Traditionally, the parallelism community has used data-flow analysis to detect data de-

pendencies and compute side-effects. Programs written in modern imperative object-

oriented languages tend to have more complex aliasing and data flow patterns compared

to programs written in more traditional imperative programming languages. Unfortu-

nately, these additional complexities complicate the application of traditional data-flow

analysis techniques to programs written in these languages. For example, a traditional

context sensitive or context insensitive may-alias analysis is not well suited to use in

an object-oriented program; a superior solution is possible by modifying the approach

to suit the language [86].

The purpose of data-flow analysis in the parallelization process is essentially to discover

invariants which can be used to compute data dependencies between different parts of

the program. Unfortunately, modern imperative programming languages do not have

features to help facilitate the discovery of these program invariants. In this thesis I

have proposed language features which help to facilitate the discovery of these invari-

ants while minimizing the syntactic and semantic burden on the programmer. These

language features I have developed may not provide the same precision as traditional

dependency techniques, but they are more abstract and composable which helps to

facilitate reasoning in large programs.

One of the fundamental principles of object-oriented programming is encapsulation;

an object’s internal state should be protected from external access and modification.

The use of encapsulation provides a number of benefits including simplified design,

debugging, maintenance, and reuse. Popular object-oriented programming languages,

like Java and C], do not provide language features for strong encapsulation enforcement;

they provide only limited name protection. Type systems for encapsulation enforcement

and tracking have been heavily studied by the verification community. These systems

can form a basis for building a hierarchical effect abstraction system which can be used

to help simplify a number of different data-flow analyses.


8.2 Traditional Data Dependency Analysis

The traditional approach to performing a data dependency analysis is to compare,

pairwise, all of the statements in the code fragment being analyzed to determine the

nature of the dependency between the two statements, if any. The data dependency

analysis itself operates on the level of individual variables. These analysis techniques

operate on value types checking for variable name equality to detect if a dependence

can exist or not. When reference types are encountered, a may-alias analysis needs

to be performed to determine which variables might actually be referring to the same

object at runtime.

My approach, in contrast with traditional data dependence analysis, uses methods as

a fundamental unit of effect abstractions. Dynamic binding and overriding can cause

significant problems for traditional data dependency analysis techniques because the

implementation of the method which will be run at runtime needs to be determined.

My approach avoids this problem by enforcing effect consistency during overriding and

so declared effect signatures can be trusted to describe the maximum possible effects

of a method even if that method is later overridden in another class.

8.2.1 Array Dependence Analysis

A number of different techniques have been proposed to try to answer questions about

the relationships between different array index expressions. These techniques operate

only on affine indexing expressions. One of the simplest approaches, proposed by

Banerjee, uses the GCD to determine if the two array index expressions could be

equal [11]. Banerjee requires the loop to be normalized to iterate from 1 to a terminal

value incrementing by 1. Once the loop is in this form, the array index expressions

are arranged into the form a * i + b and c * i + d. Banerjee proved that if a loop

carried dependence exists then GCD(c, a) must divide (d − b).

Techniques for detecting when array index expressions could lead to dependencies

through arrays continued to evolve during the 1990s. In 1991, it was proved that

solving a system of constraint equations for array index expression to find out if they

could cause a data dependency on the array is an NP -complete problem [75]. One


of the most advanced techniques which allows precise solutions to systems of affine

array index equations to detect dependencies was proposed by Pugh in the form of

the Omega Test [98, 97]. The Omega Test was ultimately developed to show that

for systems of affine equations, the data dependence analysis could be performed in a

reasonable amount of time [68].

The approach I have proposed, which operates on collections and iterators rather than

arrays avoids the affine constraint restrictions common to traditional array data de-

pendence techniques. As with many of these traditional techniques, my system still

requires the elements of the collection being traversed to be unique which is not an

easily discovered or enforced program invariant. Traditional dependence analysis tech-

niques may be able to find some opportunities for parallelism which cannot be found

using my system because of the abstraction of effects in terms of contexts. What is

gained, however, is a greatly simplified ability to reason about data parallel loops and to

handle more general purpose loops than those traditionally focused on when considering

array data dependence analysis. Finally, my techniques describe effects in an abstract

and composable manner which can be used to prevent the explosion in the number of

comparisons that are required that occurs with the traditional pairwise consideration

of loop body statements.

8.2.2 May-Alias Analysis

A may-alias analysis is a static analysis and so it does not have access to actual pointers

and objects. Instead of tracking objects and memory addresses, may-alias analyses try

to disambiguate variables using allocation site information. An allocation site is the

method responsible for allocating an object. Therefore, if two objects are created at

the same allocation site, a traditional may-alias analysis would identify that the two

objects could be the same. This is an approximation of the program’s actual behavior.

One important technique used to improve the precision of may-alias analysis is context

sensitivity. In context insensitive analyses, each method body is analyzed in isolation

with no information about the parameters supplied to the method [86]. Context sensi-

tive analyses are far more precise than context insensitive analyses. each method body

is analyzed once for each site it is called from and the information from the calling


context is used to help make the may-alias analysis results more precise [86].

Milanova et al. studied the application of Andersen’s C may-alias analysis, a traditional

context sensitive analysis, to the Java programming language [86]. They found that

the analysis does not take into account the target of a method’s invocation. Because of

this, a method which modified one object’s state appeared to be manipulating the state

of all objects of the same type [86]. Milanova et al. proposed a new object sensitive

analysis which distinguished methods invoked on different allocation sites which allowed

the updating of different object states to be distinguished from each other [86]. This

object sensitive may-alias analysis has subsequently been refined and improved by a

number of other authors, most recently Bravenboer and Smaragdakis [22].

Unfortunately, even with the development of new object sensitive analyses, these tech-

niques are still only approximations of the actual behavior of the program. The main

problem is that the languages to which these analyses are applied do not provide any

features to help distinguish aliases or track the flow of data through a program. This

thesis contributes an approach which shows how an object-oriented language can be

modified to help provide this tracking and how this can, in turn, be used to reason

about the behavior of a program. My approach may not work as well for some specific

scientific applications, but it is better able to handle programs written in modern im-

perative object-oriented languages. With further study, a combination of my approach

and more traditional approaches may allow programmers to get the best of both worlds.

8.3 Automatically Parallelizing Compilers

Creating compilers to automatically parallelize sequential programs has long been a

goal of computer science research [15]. A lot of research was done in this area in the

1980s and 1990s with a primary focus on compilers for the Fortran77 programming lan-

guage [51, 17, 11, 10]. This work has resulted in a number of commercial automatically

parallelizing Fortran compilers like the Intel Fortran Compiler and IBM XL Fortran to

name but a few.

The Fortran77 language is a relatively simple language from a program analysis per-

spective [107]. It does not employ any of the language features which make analyzing


programs in more modern languages, Java and C] so difficult including: pointers, ref-

erence types, dynamic binding, and dynamic linking. All of these modern language

features provide avenues for aliasing and Fortran is easier to analyze largely because of

this.

There was also work done on creating parallelizing compilers for C [70] and later Java [8].

The added complexity of these languages meant that the compilers were often unable

to determine if a loop could be safely parallelized. To work around this problem, many

compilers allowed programmers to provide hints to guide the parallelization of their

programs (for example High Performance Fortran for distributed memory systems [54]).

Others built tools where programmers could interact with the compiler to help decide

where parallelism could be safely employed (for example SUIF Explorer [72] and the

Polaris compiler [17]). The hints and parallelization decisions taken by programmers in

these systems goes largely unverified which can allow many subtle bugs to be introduced

into programs through the parallelization process.

My system provides an alternative form of program annotation from the unverified an-

notations employed in previous systems. The annotations employed in Zal are verified

by the compiler which prevents programmer parallelization annotations from introduc-

ing subtle bugs in to programs. Zal is also better able to handle the complex language

features and aliasing patterns found in modern languages than previous automatic

parallelizing compilers.

8.4 Type and Effect Systems

The goal of this section is to provide an overview of type and effect systems for ab-

stracting the structure of a program’s data in a hierarchical and composable manner.

In addition, I compare and contrast the approach I have taken in this thesis to these

systems. It is important to note that the application of these type and effect systems

to the problem of undertaking full data-flow and data dependency analysis has not, to

the best of the author’s knowledge, been attempted previously.

One of the early object-based effect systems was proposed by Greenhouse and Boyland


who developed a memory region-based effect system which abstracted effects using pro-

gramer defined memory regions [49]. Effects in terms of programmer declared memory

regions do not abstract and compose well due to the lack of a rigorous relationship

structure underlying them.

8.4.1 FX

FX is a programming language developed in the 1980s [45] which is a member of the

Scheme-like family of functional programming languages. It posses both purely func-

tional operations as well as operations that permit the modification of shared mutable

state [45]. The most notable feature of FX was the addition of explicit effect annotations

to the language. Programmers were required to specify the side-effects of evaluating

a function on its signature [45]. The annotations were not verified by the compiler,

but were used by the compiler to determine which pieces of code could be executed in

parallel [45]. The language syntax was greatly complicated by the side-effect annota-

tions. Further, the lack of compiler verification of the methods’ declared side-effects

could result in obscure, hard to reproduce bugs being caused by incorrect or omitted

annotations; the errors produced would provide no hint that the annotations were the

cause of the problem. While the effect annotations introduced in FX were useful for

parallelization, the subtle bugs the annotations could cause demonstrate the impor-

tance of inference, defaults, and compiler verification to ensure that the annotations

are minimally burdensome on the programmer and that they are correct.

8.4.2 Ownership Types

With the popularization of object-oriented programming in the 1990s, software veri-

fication researchers became interested in trying to detect and enforce encapsulation.

Some of the earliest type systems to enforce encapsulation were proposed by Almedia

(Balloon Types [5]) and Hogg (Islands [57]). These systems could enforce only weak

encapsulation invariants. Around the same time as the work of Greenhouse and Bouy-

land, Ownership Types were proposed as an extension and further refinement of these

early encapsulations systems. Ownership Types provided much stronger encapsula-

tion enforcement [28, 91]. Subsequent work has liberalized Ownership Types to allow


the systems to be used for tracking encapsulation rather than rigorously enforcing it.

Many common object-oriented design patterns, such as iterators, violate encapsulation

and so by separating the encapsulation enforcement from the encapsulation tracking,

ownership systems can be used to express these patterns which would otherwise be

prohibited.

Over time, two distinct families of ownership type systems have emerged: Ownership

Types and Universe Types. The main difference between these two families is in the

mechanism of tracking encapsulation relationships. Ownership Types track encapsula-

tion using explicit ownership parameters. Universe Types, by contrast, use only relative

notations such as rep on fields holding object representation and shared on fields hold-

ing data which are part of another object’s representation. An excellent summary of the

early work in the field can be found in “Types for Hierarchic Shapes (Summary)” [41].

In 2006, Nageli published a masters thesis which showed that a number of common

object-oriented design patterns could not be expressed in languages employing the then

state-of-the-art Universe and Ownership Types systems [90]. Nageli identified a number

of type system features required to make ownership systems in general compatible with

the design patterns studied. These features included ownership transfer, read only

references, and an ability to share objects between restricted sets of contexts [90].

Following Nageli’s thesis, a number of new ownership type systems have been published

which begin to address the shortcomings he identified. MOJO [25] allowed objects to

have more than one owner, thus transforming the traditional ownership tree into a

directed acyclic graph (DAG). MOJO also provided wildcards so that the types of

objects with multiple owners can be named without all of the owning contexts being

nameable. This simplified writing programs using the system [25]. Cameron’s Jo∃

added existential types to traditional ownership systems to allow provably-safe, con-

strained ownership variance [24]. Lu and Potter proposed Effective Ownership Types

which added a wildcard any context, which can be used to abstract owners of types

along with effective owners on methods to constrain the mutable side-effects of the

method [73]. Lu, Potter, Xue have also proposed Oval [74], a language which employs

validity contracts to determine which parts of a system a given method depends on and

which parts it may modify. The validity contract consists of a list of contexts which


must be valid before the method’s execution and a list of contexts invalidated by the

method. This is a more general approach to the standard read and write effects such

as those I have used in my system. Using these validity contracts, it might be possible

to reason about parallelism, but no work has been published on this to date.

There has also been an effort to create formal core calculi to prove type system prop-

erties. Some of the more notable are Ownership Generic Java [95] (Featherweigtht

Generic Java extended with ownerships) and System Fown [69] which adds ownership

to System F. System F, otherwise known as the second-order λ-calculus, is a general

calculus for languages which has parametric polymorphism. This means it is a general

calculus which can be used to model the behavior of many other languages. This means

that adding ownership to this language is an important result from a type theoretic

point of view since it generalizes ownership to a large family of languages. These formal

calculi are not directly applicable to reasoning about parallelism, but they can be used

as the basis for type and effect systems which can be used to prove properties useful

for parallelization.

Ownerships Types research has produced a family of type systems with a number of

advanced features designed to facilitate typing of a number of complex programming

patterns and object relationships. The choice of which language features to employ is

determined by the types of program being analyzed and the analyses being employed.

In my ownership system, I use language features and ideas from a number of different

sources. My use of context parameters and effects resemble those of Joe [27]. The sub-

contexts found in my system resemble domains in Ownership Domains [3]. The effect

declarations used resemble those of Smith [108] and Joe [27]. Finally, the constraint

clauses resemble those found in MOJO [25]. A more in depth discussion of these points

is undertaken in Section 5.1 and I refer interested readers there. My system is capable

of supporting many of the latest and greatest language features in Ownership Types,

but I have not had a need to use them as yet.

8.4.2.1 Ownership Side-Effects

Early in the development of ownership systems, it was realized that the hierarchical

ownership structure was well suited for use as a framework for abstractly describing


side-effects. These effects have been applied to several different problems, generally

related to validating programs. The idea of using these systems for reasoning about

parallelism has been mentioned in the past, but has not been thoroughly explored or

demonstrated before this thesis.

Clarke and Drossopoulou were amongst the first to propose an effects system based on

Ownership Types called JOE [27]. JOE provided facilities for capturing and validating

effects. JOE also included a system for statically reasoning about the disjointness of

effects. The disjointness operations formulated in JOE resemble those in Zal, but the

effects were not used to reason about data dependencies or other program properties.

JOE’s effect system focused on tracking write effects and did not include separate read

effects. The separation of read and write effects is important for computing accurate

data dependency information for use in finding inherent parallelism, as I found necessary

when formulating sufficient conditions.

Lu and Potter proposed Effective Ownership Types [73] which built on traditional

Ownership Types systems and focused on trying to separate ownership based effect

systems from encapsulation enforcement. Their system was not directly applied to any

real-world problems, but did present a number of innovative ideas. The any context in

their system is, for my purposes, equivalent to the world context when read or written.

This limits parallelization opportunities because a read or write of the any context

includes all other contexts in the system. In addition, the lack of method side-effect

declarations in Effective Ownership Types limits the ability to reason about side-effects

in the presence of overriding.

Several of the more recent Ownership Types systems including MOJO [25] have begun

to include effect systems as a matter of course in formulating new ownership systems.

These formalizations look at how the new language and ownership features proposed

impact the effect system. However, these type system papers do not apply the effects

system to actual programming problems such as parallelism.

8.4.2.2 Applications to Parallelism

In the literature there have been suggestions of using ownership types for purposes

relating to parallelism. This work has tended to focus on the validation of existing


parallel programs and lock ordering in particular. Boyapati, Lee, and Rinard proposed

a system for validating lock ordering using Ownership Types [20]. Their effect system

captures the contexts locked by a particular block of code and succeeds in preventing

deadlock using these annotations. Milanova and Liu have presented a system which uses

the notion of ownerships to detect synchronization errors [85]. They do not modify the

type system, but use the notion of ownership and encapsulation to construct a graph

of the relationships between objects which is used in their analyses. Unfortunately,

these systems capture effects as sets of contexts locked and not as a set of contexts

read and written. The sets fo contexts locked do not provide sufficient information to

reason about inherent parallelism. In addition, my system adds a runtime system which

allows the relationships between context parameters and the disjointness of elements

in collections to be tested; a feature not found in these parallelism validation systems.

8.4.2.3 Summary

The area of Ownership Types has rapidly matured in the last few years. The literature

documents a number of advanced type systems with features to support a number of

different programming styles and reasoning systems. The choice of which to employ

in a particular language is determined by the problems the language is expected to

be applied to. A number of effect systems have been proposed based on Ownership

Types. To date, the parallelism applications have been limited to validation. Lock order

validation is a different problem that requires less complex reasoning than the general

computation of side-effects required to detect inherent parallelism (see Chapter 5) and

Zal(see Chapter 4).

8.4.3 Universe Types

Universe Types is an alternative branch of Ownership Types which is designed to facili-

tate encapsulation tracking and enforcement [88]. Universe Types are significantly more

lightweight than Ownership types in terms of annotation overhead both in the number

of annotations and their complexity. Rather than having to specify explicit owners on

types, fields are simply marked as being part of the representation, shared with other

objects, or other relationships depending on the specific system used. This reduced


complexity has both advantages and disadvantages. The reduced complexity makes

it easier to implement some language features which have been quite hard to imple-

ment in Ownership Types such as ownership transfer [89] and ownership inference [83].

Recently, Dietl has published a PhD thesis, in which he has cleanly separated the own-

ership annotations from encapsulation enforcement so that the ownership annotations

can be used simply as a hierarchical description of the program’s data structures [38].

The less precise nature of the effects possible with Universe Types makes the system less

suitable than Ownership Types for reasoning about data dependencies and parallelism.

Effects can only be described as occurring on objects in a representation, peer, or other

relationship to the current object. Using such a system, it is not possible to determine

if references to objects with the same type and annotation can interfere or not which

is much less useful than the more precise owner effects possible with Ownership Types.

8.4.3.1 Applications to Parallelism

Cunningham, Drossopoulou, and Eisenbach proposed a system for validating a program

to check that locks are taken on shared resources to prevent data races [34]. As was the

case with the work of Boyapati et al. [20] discussed above, the effect system does not

capture effects in a sufficiently broad and precise manner to facilitate reasoning about

inherent parallelism as my proposed system does.

8.4.4 Ownership Domains

In Ownership Domains, proposed by Aldrich and Chambers [3], programmers declare

explicit memory regions, called domains, in which to store data. The key difference

between Ownership domains and the earlier work of Greenhouse and Boyland [49] is

that Ownership Domains use ideas from Ownership Types to provide some default re-

lationships between domains in addition to allowing programmers to declare explicit

access permissions on domains. Ownership Domains also allow types to be parameter-

ized with domain parameters so that domains can be passed to objects. The domain

parameters may be constrained to restrict the domains which can be supplied when

instances of the type are constructed. The sub-contexts in my proposed system are

similar to the domains in Ownership Domains. Ownership Domains are flexible and


can capture more complex and subtle object relationships than traditional Ownership

Types. Building a hierarchy from the declared domains is much more complicated due

to the complex inter-domain relationships that are permitted in Ownership Domains.

8.4.4.1 Effects

Smith has proposed an effect system based on Ownership Domains [108]. Smith’s

proposed effects system employs explicit read and write sets such as those used on

methods in my system. Similarly, the system I propose, Smith’s system has the abil-

ity to constrain relationships between domain parameters. This is equivalent to the

context constraints used in my system. Some of Smith’s effects resemble the effects I

have chosen to employ, while others not found in my system are based on additional

expressiveness inherent in Ownership Domains. These additional annotations may be

useful in solving parallelism related problems, but come at the cost of the increased

annotation complexity and overhead of the Ownership Domains system. Smith made

no attempt to apply his effects system to a specific domain such as parallelism. While

the work contributes a number of interesting ideas which may be useful, it does not

discuss how to use the system to solve specific problems.

8.4.5 Boxes

Boxes [106] and Loose Ownership Domains [104] are both type systems based on Own-

ership Domains. As with basic Ownership Domains type system, the programmer

explicitly declares the regions in which objects live and can control the access per-

missions and relationships between regions. These type systems add to Ownership

Domains, a system for allowing domains to be abstracted. This increases the flexibility

and modularity of the type system. The use of sub-contexts in my proposed system is

basically a subset of the functionality of these type systems and provides some of the

same power and flexibility without the need to always declare domain relationships.

The Boxes type system has been applied to the actor programming model to help

enforce object encapsulation in the CoBoxes type system [105]. In the CoBoxes type

system, classes can be marked as CoBoxes. Each CoBox has at least one stream of


execution and invoking methods on the CoBox is done asynchronously. The domains

in the type system enforce encapsulation so that data in CoBoxes is disjoint to prevent

data races between CoBoxes. Similarly to other actor and message passing based

systems, the CoBox concurrency model is powerful and flexible, but often requires

existing programs to be restructured before they can take advantage of the language’s

parallelism features. Further, the language does not assist the programmer with the

task of deciding which class should be CoBoxes and which should not, unlike the system

I have proposed.

8.4.6 Uniqueness, Read-Only References and Immutability

The major alternative to using Ownership Types for encapsulation enforcement involves

the use of unique and read-only references to objects. By restricting a reference to be

read-only, the reference cannot be used to violate encapsulation since modifications of

the object being referred to are not permitted. By restricting access to encapsulated

state, the process of ensuring invariants are enforced is greatly simplified.

The origins of this approach can be traced to Hogg’s Island types [57]. In Hogg’s

system, objects are grouped; within each group unrestricted aliasing is allowed, but

external aliases to the group must be made via the single bridge object that connects

the objects in the “island” to the rest of the world.

Uniqueness type systems have not been as well studied on their own, but they have

been used in conjunction with other encapsulation tracking and enforcement systems to

provide greater flexibility and expressive power. There have been efforts to add these

features to traditional Ownership Types systems including Featherweight Ownership

with Immutability Generic Java [129]. No major efforts have emerged trying to use

these properties to exploit the side-effect restrictions implied by these techniques for

parallelism verification or parallelization purposes.

8.4.7 SafeJava

SafeJava is an extension of the Java programming language. Its type system is based

on Java’s type system extended with ideas from Ownership Types [19]. The simplicity


and flexibility of the type system falls somewhere between that of Ownership Types

and Universe Types. SafeJava validates synchronization between threads to prevent

data races and deadlocks as well as provide region-based memory management. The

system for reasoning about data races and dead locks focuses on reasoning specifically

about locks. Methods can be annotated with lists of locks that must be held when

the method is invoked and lists of locks taken by the method. There are a number of

advanced features for reasoning about locks and thread local storage, but the system

does not attempt to discover where inherent parallelism exists. This is a very different

problem since it requires discovery of data dependencies using effect information.

8.4.8 Deterministic Parallel Java

In late 2009, Bocchino et al., published a paper on Deterministic Parallel Java (DPJ) [18]

which is probably the closest competitor to the system I have presented in this thesis.

DPJ uses a variant on ownership types to reason about the data dependencies and


The major difference between DPJ and Zal lies in how contexts are specified. In

DPJ, contexts are specified using Region Path Lists (RPLs) rather than simple names.

These RPLs list the contexts between the context being named and the root. Parts of

the list may be abstracted using an asterisk (*). This difference in how contexts are

named has a significant impact on how context disjointness is determined. In DPJ,

the disjointess of two contexts can be determined by comparing their RPLs to see if

there are any common contexts which would indicate that the named contexts share a

parent-child relationship. The use of RPLs have allowed DPJ to be used to successfully

detect and exploit inherent parallelism in a number of traditional scientific computing

benchmarks. Unfortunately, the use of these RPLs could cause implementations details

to be leaked through the context and effect abstractions. Appropriate use of the asterisk

(*) may help to prevent this, but even occasional leakage could prevent the system

from composing as mine doe. In addition, the use of RPLs requires programmers to

maintain awareness of the ownership structure in their entire program rather than just

local ownership information as is required in Zal.

Unlike Zal, DPJ does not make use of a runtime ownership tracking system and so does


not provide support for conditional parallelization. DPJ requires the same collection

disjointness as Zal, but takes a different approach to enforcing the condition. In DPJ,

the compiler allows the programmer to use a form of dependent types to try to verify

statically that element access operations in an array are disjoint and can be run in

parallel. This is a complex undertaking and requires the programmer to identify and

expose index variables and other implementation details to the DPJ compiler, unlike the

iterator based approach used by Zal. The DPJ approach is well suited to traditional

scientific computing tasks, the focus of the DPJ system, but is more complex and

exposes more implementation details than the approach I used in Zal. I chose to

use a runtime disjointness test so that more general purpose loops could be handled

by Zal. In terms of sufficient conditions for parallelism, DPJ uses RPL effects in a

traditional data dependency analysis. The authors do not propose sufficient conditions

in terms of RPLs. The sufficient conditions I have presented for Zal can be used by

programmers, as well as automated tools. This helps facilitate reasoning about side-

effects and dependencies.

DPJ is a very powerful, flexible, and useful system, but it lacks the ability to compose

as effectively as Zal. The DPJ authors have demonstrated that their system is able to

parallelize a number of traditional scientific benchmarks including benchmarks from the

Java Grande suite which Zal is not currently able to parallelize. The RPL ownership

annotations employed by DPJ are more complex than the simple ownership annotations

employed by Zal. As a result, it is harder to annotate a program using DPJ RPL-style

annotations than it is to annotate the same program with Zal-style annotations. Zal’s

use of runtime ownership tracking allows for data-dependent program behavior which

is not possible with DPJ and so Zal can conditionally exploit parallelism in situations

where DPJ cannot (for example, the hash table example in Chapter 4. It may be

possible to combine the features of these two systems in the future to have the best of

both worlds, but how to do so is an open research question.

8.5 Logics

Computer scientists have long used formal mathematical tools to try to reason about

programs. Some of these techniques employ numerical methods. Other techniques


rely on formal logic to try to prove properties of programs. In this section, I briefly

discuss two logic systems which can be used to reason about data dependencies and

parallelism and how these formal systems relate to the system I propose in this thesis.

These systems provide a formal basis from which to build, but they are not directly of

practical value in that they cannot be directly added to a language. The logic systems

are abstract reasoning systems and are not formulated for a specific language or set

of syntactic features. These logic systems, generally, need to be incorporated into a

language’s type and effect system at some level. This is usually a non-trivial task.

8.5.1 Hoare & Separation Logic

One notable contribution towards describing programs logically and proving properties

through the development of axioms and traditional proof techniques was made by Hoare

in his 1969 paper “An Axiomatic Basis for Computer Programming” [56]. In Hoare

Logic, as this contribution has been named, programs are described in terms of the

pre-conditions which must hold before they are run and post-conditions which describe

the result of the programs’s execution. Hoare introduced a notation of P{Q

}R where

P is the set of pre-conditions, R is the set of post-conditions, and Q is the program to

which the conditions apply.

Hoare logic has been applied to many problems, but the most relevant of these for the

purposes of reasoning about parallelism is Separation Logic simultaneously discovered

by John Reynolds [101] and Ishtiaq and O’Hearn [61]. In separation Logic, the pre and

post conditions describe the state of the program before and after execution as they do

in Hoare Logic [56]. Parallelism is determined by proving the independence of the pre

and post conditions for two programs to be run in parallel. As long as the state required

and modified by the two programs, as specified in the pre and post conditions, does

not contain any shared values there are no shared dependencies between the programs

and they can be run safely in parallel [102]. Reynolds presented some pre and post

conditions for some common language features to demonstrate the use of the system

and its power [102].

Separation Logic is a strong theoretical foundation for reasoning about programs due to

its derivation from the fundamental logic underpinning programming. There has been


work on using Separation Logic to reason about inherent parallelism in programs, but

this work has been in conducting proofs and validating parallelization techniques [99].

The application of separation logic to real-world language features is complex and

difficult. The average programmer would not want to be burdened with having to

describe their entire program in terms of Separation Logic pre and post conditions.

8.6 Programming Languages for Parallelism

Each programming language has a combination of features and syntax which make it

unique. Often, new language features are created to solve a problem not easily solved

with existing language constructs. In this section, I examine several different languages

and compare and contrast their approaches to parallelization with my system. When

parallelizing a program, there are three questions which need to be answered:

1. Can the program be safely parallelized?

2. Should the program be safely parallelized; is it worthwhile doing so?

3. How should the program be parallelized?

Existing languages generally address only one or two of these questions. Overall, there is

no language with simple syntax and imperative style computation that provides features

which facilitate reasoning about data dependencies and parallelism. The languages

discussed in this section have a number of interesting features for expressing parallelism,

but there is no validation of whether the parallelism added to programs using these

constructs violates sequential consistency or not.

8.6.1 Haskell & other Functional Languages

The Haskell programming language was the product of an academic initiative to create

a standard language platform for performing research into functional programming [59].

Haskell is the current major exemplar of a lazy evaluating functional language and so

includes a number of unique features [59].


Having been created for functional programming research, Haskell was designed as

a purely functional programming language with no shared mutable state and conse-

quently all functions are side-effect free. Initially, operations with inherent side-effects,

such as I/O, were modelled with great difficulty and some side-effecting operations had

to be implemented outside the Haskell language itself [59]. This problem was solved

by the introduction of Monads into Haskell to model mutable state, I/O, and all other

operations with side-effects [64].

The lack of side-effecting operations in purely functional programming languages means

that multiple function invocations can be evaluated in parallel. This means that these

purely functional programming languages help programmers to answer the question

of can the program be parallelized. These languages do not, however, address the

questions should the parallelism be exploited or how to exploit the parallelism when it

is worth exploiting.

Haskell uses monads to add operations with side-effects to the language without violat-

ing the functional purity of the language. Monads provide a type system mechanism for

sequencing side-effecting operations so that the Haskell interpreter respects the speci-

fied order for the side-effecting operations. Essentially, monads are a means of explicitly

sequencing operations in an inherently parallel language which is the opposite of my

system which tries to expose parallelism in an inherently sequential language. Because

monads provide a means of explicitly sequencing operations, it is possible that there

could be inherent parallelism amongst the sequenced operations. The Haskell type

system does not provide features to facilitate reasoning about the inherent parallelism

which may exist within operations sequenced using monads.

Haskell, similarly to many other functional programming languages, also has a number

of other powerful syntactic constructs. The most interesting of these is the list compre-

hension. List comprehensions allow for the generation of infinite data sets as well as the

application of operations to elements of a list without the need for explicit iteration over

the list itself [59]. This construct is ideal for parallelization since dependency detection

is greatly simplified and the construct is highly declarative in nature. The use of this

construct removes indexing and other iteration specific notations. By making the itera-

tion implied rather than explicit, loop carried dependencies are no longer an issue. The


transformation being applied to each data element is also made explicit which is ideal

for parallelization and dependency detection. The enhanced foreach loop proposed

in my system (see Section 2.4.1.2) is also another more declarative means of writing a

loop. It is not as powerful and flexible as Haskell list comprehensions, but it serves a

similar purpose while providing a syntax more similar to traditional loops than that of

Haskell list comprehensions.

Haskell is a powerful language with many desirable parallelization properties. The lazy

functional programming paradigm is quite different from the the imperative program-

ming paradigm used by the majority of developers writing general purpose software.

These differences can create a barrier for those who would otherwise be interested in

using Haskell to improve the performance of their programs. While Monads order side-

effecting operations, exploiting an inherent parallelism between the ordered operations

is not currently supported by the Haskell language. Overall, the Haskell language of-

fers a number of interesting and powerful features which may contribute to solving the

problem of how to parallelize programs written using modern object-oriented languages,

but it does not provide a complete solution to the problems tackled in this thesis.

8.6.2 Cyclone

Cyclone is a C dialect designed to enforce memory safety; that is to prevent buffer

overruns, memory leaks, and other memory related problems endemic in programs

written in pure C [63]. Region-based memory access is used to ensure the safety of

memory operations [63].

In Cyclone, all pointer types are annotated with the region they reference either manu-

ally in the code by the programmer or automatically by an inference engine at compile

time [50]. The Cyclone compiler then uses the region annotations to determine if the

memory pointed to by any given pointer is still allocated and if assignment to the lo-

cation is permitted [50, 117]. The Cyclone type system employs effect annotations on

function signatures to determine which regions the function may access [50]. These ef-

fects are used to ensure that pointers cannot escape through existential types [50]. It is

conceivable that such a system could be used to reason about parallelism by computing

dependency information based on these effects, but this has not been explored to date.


Significant amounts of C code have been ported to the Cyclone platform and the pro-

grammer burden of this porting has proven minimal thanks to the annotation inference

engine [63, 50]. It is likely that such a system would require significant modification if

it were to be used with object-oriented languages, but this project demonstrates that,

with an appropriate inference engine, annotation heavy language syntax can be used

with minimal burden on the programmer. It also shows that reasoning about memory

interactions is possible, even in highly unstructured languages such as C. The Cyclone

region-base memory model is more general than the ownership types or unique alias

models of encapsulation enforcement and so may contribute useful concepts towards

reasoning about data dependencies, and so, parallelism.

The ownership and effect system I have used to reason about side-effects and inherent

parallelism has been applied only to the safe subset of the C] language. The unsafe

subset of C] adds pointers to the language syntax and disables some of the language’s

automatic memory management. The use of pointers and pointer arithmetic means

that it is much harder to track where in the stack or heap a pointer is pointing. This

means it becomes difficult to determine what part of the stack or heap is being read

or written via the pointer. Cyclone uses a region model that is less structured than

the Ownership Types memory model; Cyclone regions do not have an implicit nesting

hierarchy associated with them as ownership contexts do. This less structured model is

ideally suited to modelling memory interactions in highly unstructured unsafe code. In

the future, when looking to expand Zal to include unsafe code, Cyclone regions may be

a useful starting point. For example, Cyclone has a syntax for annotating pointers with

region information which may be suitable for use with ownership contexts. Cyclone has

mechanisms for validating constraints on the areas of the heap and stack to which a

pointer may refer.

8.6.3 Scala

Scala was developed as a research programming language at the Ecole Polytechnique

Federale de Lausanne and it has subsequently gained some acceptance in the wider

programming community [93]. The core of the language is based on the object-oriented


programming paradigm, but with the addition of functions as first-class citizens (in-

cluding higher-order functions, partial evaluation, and continuations) [93]. While the

blend of paradigms and features in Scala is interesting, the most relevant feature of the

language to this thesis is its concurrency model.

In Scala the basic unit of concurrency is the actor. Each actor runs on its own

lightweight thread [52]. Communication between actors is achieved through message

passing with non-blocking send and blocking receive semantics and pattern matching in

the receiver to decide how to process the received messages [52]. Lightweight threads

are virtual machine artefacts which are dynamically mapped onto operating system

threads and processes by the virtual machine during program execution [52]. This

model of computation has achieved great success in the Erlang functional program-

ming language [7] which is well regarded in both academia and industry and has been

gaining popularity. I have focused on discussing Scala rather than Erlang because it

demonstrates how the Erlang parallelism model can be modified for use in an object-

oriented language. Actor-style parallelism is well suited to exploiting coarse-grained

parallelism between entities in a system, but it does not lend itself well to the exploita-

tion of opportunities for fine-grained parallelism. Further, existing programs may need

significant restructuring to fit the actor concurrency model.

Scala and Erlang provide tools to allow the programmer to annotate in their program

which parts can be safely parallelized, but they do not support verification that the

parallelism added by a programmer is actually safe to exploit. Scala and Erlang also

take care of deciding how to exploit the parallelism through the language supplied

implementation of lightweight threads. The major difference is that these languages

require the programmer to restructure their program to fit the parallelism pattern built

into the language and they do not supply tools to facilitate this restructuring nor do

they validate any parallelism explicitly exposed by the programmer.

8.6.4 High Productivity Computing Languages

The Defense Advanced Research Project Agency in the United States has funded a

High Productivity Computing Systems (HPCS) research program over the last few

years [35]. The goal of this project is to develop technology to allow a multi-petaflop


computer system to be built along with tools to allow programmers to efficiently write

scientific and cryptographic applications to be run on the new hardware. A number

of companies participated in this project and prototype development of three systems

from Sun Microsystems, IBM, and Cray was funded [35]. The development of these

prototypes produced three new programming languages: Fortress [115], X10 [103], and

Chapel [33] from each of the companies respectively. Each of these languages includes a

number of features for expressing parallel algorithms, but there is no verification of the

correctness of the parallelism encoded in a program or for identifying where additional

exploitable parallelism may be found in programs.

8.6.5 Spec#

Spec] is a superset of C] which allows programmers to encode pre conditions, post con-

ditions, a list of exceptions that can be thrown and declarations about variables and

fields modified by the method [13]. The declarations are used to facilitate the enforce-

ment of object invariants and design assumptions by the compiler and by the runtime

system called Boogie [13]. The overall goals of the language are to reduce the number

of errors that go undetected at development time to reduce software development costs

and to increase the quality of software produced [13]. The syntactic extensions in Spec]

are quite verbose and can impose a significant additional burden on the programmer.

Some representation exposure also occurs though the method contracts. This is not

generally desirable due to all the well known problems with abstractions leaking im-

plementation details. Spec] is an example of a language extension which allows Hoare

Logic-style pre and post conditions to be enforced in a program. There has been some

work inspired by Spec] and Boogie which has used similar techniques to reason about

parallelism. One example of a system which does this is Oval [74] which was discussed

in Section 8.4.2. Overall, the Spec]/Boogie style of enforced contracts allows a number

of program invariants and properties to be captured in a program and validated. Fully

generalized pre and post conditions, such as those found in Spec] are powerful, but writ-

ing these conditions can impose a significant additional overhead on programmers [74].

Spec] serves to demonstrate the practicality of Separation Logic driven approaches, but

does not directly contribute to systems for reasoning about parallelism.


8.7 Alternative Concurrency Abstractions

Threads are one of the most popular concurrency abstraction in mainstream program-

ming languages today. Threads are an excellent abstraction for certain types of par-

allelism or for situations where precise control of the data synchronization process is

required. There are, however, a number of other concurrency abstractions provided

as libraries or language extensions which provide more abstract parallelism constructs.

This section will discuss several of these systems. The key point that will be demon-

strated is that there is little or no support in these tools for ensuring program semantics

are preserved when sequential programs are transformed into parallel programs using

these constructs.

8.7.1 Futures

The Futures model of parallel computation was first proposed as part of the MULTIL-

ISP programming language in 1985 [53]. More recently, futures have been added to the

Java Standard Editions platform in version 5.0 [122] and the .NET Platform in version

4 [80]. When a future is evaluated, the expression immediately returns undetermined

and a new thread of computation is started to evaluate the expression required to pro-

duce the actual value [53]. When the value is computed it replaces the undetermined

place holder. If the undetermined placeholder is accessed before the computation of

the required value is completed, the thread performing the read is suspended until the

future completes execution and returns a value to replace the placeholder [53].

This model of computation works well in purely functional programming language

where there are no side-effects from method invocation. Unfortunately, in Java and

C], expression evaluation can result in observable side-effects which are not captured

in the future model of computation as it was originally implemented [122]. Welc, Ja-

gannathan, and Hosking implemented a system of object versioning and dependency

determination that relieved Java programmers using futures of manually synchronizing

all access to shared state. This accounts for the side-effects of futures [122]. Unfortu-

nately, the solution proposed is complex and improves performance only when specific

types of parallelism are present. My Zal compiler makes use of futures to expose task


parallelism discovered in programs.

8.7.1.1 Task Parallel Library & Parallel LINQ

In 2010, Microsoft released the Task Parallel Library(TPL) and Parallel LINQ (PLINQ)

as part of the .NET 4 platform [79, 80]. The TPL provides facilities for exploiting data

parallelism and task parallelism in .NET Platform languages such as Visual Basic.NET

and C]. Data parallelism can be exploited through the use of parallel for and foreach

loops while task parallelism can be expressed using tasks which allow for constructs such

as one-way calls and futures. C] 3 and Visual Basic 9 introduced Language Integrated

Queries (LINQ) which allowed data sources to be manipulated in the language using

an SQL-like syntax. PLINQ provides an API which can be used to parallelize the

execution of a LINQ query (see Section 4.5).

Both the TPL and LINQ only supply methods which facilitate the exploitation of

parallelism; they do not provide any means of checking the correctness of programs

parallelized using the API. This means these tools are useful for programmers with

specialist parallelism training, but they do not help programmers without this training

correctly parallelize their applications. My Zal compiler makes use of the parallel

foreach loops to implement data parallel foreach loops. They are also used in the

supplied implementations of my enhanced foreach loop API.

8.7.1.2 OpenMP

OpenMP is a set of APIs and syntactic extensions for C++ and FORTRAN which is

designed to make it easier to exploit fine-grained parallelism in applications on shared

memory computers [94]. Programmers are required to explicitly annotate the par-

allelism in their code as well as provide any necessary synchronization between the

different threads of execution [94]. The annotation system is large, complex, and re-

quires background knowledge of parallelism and synchronization techniques. Further,

the programmer is responsible for understanding different patterns of interaction which

may occur and providing appropriate guards in the code to prevent races and deadlocks.

Programs must be transformed so that the parallelism to be exploited is exposed in a

manner consistent with the provided APIs. Once transformed, the lack of validation


of the correctness of the parallelized program means that errors in the parallelization

can trigger unrelated error messages making debugging the program quite difficult.

OpenMP helps to simplify the syntax required to expose and manage parallelism, but

it does not help programmers unfamiliar with parallelism find and safely exploit inher-

ent parallelism.

8.7.2 Message Passing

Message passing is a common parallelism pattern where different processes or threads

communicate by passing messages by value between each other. Features supporting

this style of parallelism can be found in Smalltalk [67] and have been included in other

object-oriented systems [2, 67, 121]. In this section I will compare and contrast my

proposed system with several different message passing parallelism models.

8.7.2.1 Actor Model

The actor pattern of concurrency was first described by Hewitt, Bishop, and Steiger in

1973 [55]. This work was subsequently generalized by a number of other researchers

including Agha who proved the composability of actor based systems as well as pre-

senting a well formulated general actor system for distributed computing [2]. The

fundamental principle of the model is that actors communicate by message passing

with asynchronous message sending and synchronous receiving [121]. This idea of com-

munication via message-passing also formed a part of the original object-oriented pro-

gramming paradigm formulation and the associated language — Smalltalk [47, 67, 121].

Communication by message passing requires that only pass-by-value semantics apply

to the function invoked using it [67, 121]. Further, it is also necessary to ensure that

objects are fully encapsulated so that only the shared mutable state associated with a

specific object may be modified only via that object’s interface [2, 121]. In popular mod-

ern object-oriented languages such as Java and C], message passing has been reduced

to direct function invocation and parameters may now be object references creating the

complicated mutable state access patterns seen today [121]. Actor model research has

most recently focussed on distributed computing and so can model parallelism. The


actor model maps well to the object-oriented paradigm and would allow for easy ex-

ploitation of coarse-grained parallelism. Further, the synchronization and parallelism

are completely transparent to the user. Unfortunately, the pass-by-value semantics

required by this model are prohibitively expensive for programs written in modern

object-oriented languages which rely on pass-by-reference semantics for performance

reasons. The strict encapsulation required would prohibit the use of many common

object-oriented design patterns as discussed by Nageli [90]. There has been some re-

search into combining Ownership Types with the actor concurrency model [26, 110].

The ownership is used to ensure that the messages passed between actors, and that the

actors themselves, are fully encapsulated as required.

8.7.2.2 MPI

The Message Passing Interface (MPI) is the most popular, scalable, manual paralleliza-

tion tool today. It consists of an API for C, C++, and FORTRAN. The interface was

designed to provide a general message passing facility to its target platforms [77]. The

most significant problem with the use of MPI to express parallelism is the need to

significantly refactor existing programs and to provide explicit synchronization code.

The programmer is not assisted with these tasks which can be complex. Some have

criticized the MPI library for the complexity of its API and the lack of features to

specifically support parallelism [77], but the API provides all of the functionality re-

quired and has been successfully used to express parallelism. MPI helps to simplify the

implementation of parallelism and also helps to simplify the management of distributed

parallelism. Unfortunately, the amount of information required from the programmer

to use the API requires the programmer to be very familiar with parallelization and

synchronization to use it properly.

8.7.2.3 Jade

Jade is a Java-like language which provides support for a message passing style of par-

allelism via parallel classes [37]. Parallel classes are somewhat like actors in that each

parallel class executes on its own thread and method invocations occur asynchronously.


As with MPI itself, code refactoring is required to make use of Jade’s parallelism fea-

tures. Further, programmers must explicitly identify which classes should be parallel

classes. This allows programmers to express parallelism, but there is no strong vali-

dation to ensure dependencies which could lead to data races cannot be created via

message passing. In addition, significant code refactoring may be required to increase

cohesions, reduce coupling, and make programs adhere to message passing conventions.

8.8 Object-Oriented Paradigm Considerations

Object-oriented programming employs a number of different techniques to reduce devel-

opment effort and increase reusability. Design patterns, most prominently championed

by Gamma, Helm, Johnson, and Vlissides [44], can offer opportunities for coarse-grained

parallelism. The prevalence and nature of inherent parallelism in modern object-

oriented programs is not well understood. In addition, many object-oriented languages

contain some meta-programming facilities. These facilities are frequently implemented

outside the language itself and they can cause encapsulation breaches amongst other

complications [21]. Modern use of object-oriented programming also emphasizes sepa-

rate compilation and software componentization which creates another set of challenges

for reasoning about parallelism. These different aspects of object-oriented programming

will be briefly explored in this section to demonstrate their impact on parallelism and

the need to consider them when designing solutions to the problem of how to express

parallelism.

8.8.1 Design Patterns

Modern software engineering makes great use of design patterns, standard arrangements

of object interaction, to form solutions to common classes of problems [44]. This

means there are many different types of patterns solving a number of different types

of problems. For example, there are patterns for concurrency, object creation, and

collection processing [23]. Many of these patterns are designed with object-oriented

programming in mind and so they are one of the paradigm specific considerations

which influenced the design of my reasoning system [44, 23].


These patterns can provide extra information about how a program works and the

dependencies it contains. These patterns offer a much higher-level understanding of

programs and code fragments. This higher-level knowledge could be used to recognize

opportunities to employ coarse-grained parallelism. Further, these patterns also provide

contextual information which may simplify some of the analysis required to exploit fine-

grained parallelism.

Currently, there is little knowledge about the prevalence of these patterns in actual

general purpose software and the amount of parallelism that could be extracted from

them. The most extensive study of parallelism patterns to date has been undertaken by

researchers at the University of California at Berkeley who have identified 13 common

parallelism patterns [9].

8.9 Summary

In this chapter I have presented a discussion of publications in six different areas which

are directly related to the work in this thesis.

Some of these systems provide abstractions that are well suited to reasoning about

data dependencies and parallelism. Others provide excellent abstractions for expressing

parallelism and frameworks for understanding program behavior. None of these systems

combines abstraction and composition, with parallelization and correctness checking to

produce a framework which helps both programmers and automated tools to reason


Chapter 9

Conclusion & Future Work

9.1 Summary

In this thesis, I have demonstrated how side-effects can be expressed in an abstract

and composable manner using the representation hierarchy present in programs writ-

ten using modern imperative object-oriented languages. I also demonstrated how data

dependencies could be inferred from the overlap of effects expressed using my effect

system. I have shown how my proposed effect system can be implemented using Own-

ership Types in Chapter 3 and have argued for the soundness of my proposals based

on previously published type systems in Chapter 5. Chapter 5 also sketched proofs of

the correctness of the runtime ownership relationship testing algorithms and sufficient

conditions for parallelization which are at the core of this thesis.

Chapter 4 discussed how my proposed type and effect system could be applied to

version 3.0 of the C] programming language to produce a language I have named Zal.

This application involved adding ownership type and effect information to a number of

syntactic constructs not previously considered in the literature. I have extended a full

C] compiler to produce a compiler for Zal. The design of this compiler was presented

in Chapter 6. In addition to the compiler itself, I wrote several additional support

libraries used to implement Zal and runtime ownership tracking in C]. In Chapter 7, I

presented a number of representative sample applications which showed that my system

was able to detect a number of different forms of inherent parallelism with reasonable

annotation and runtime overheads. Finally, in Chapter 8, I compared and contrasted

245

246 Chapter 9. Conclusion & Future Work

my work with other relevant related works from the literature. None of these other

systems combines abstraction and composition, with parallelization and correctness

checking to produce a framework which helps both programmers and automated tools

to reason about inherent parallelism. As a result, this thesis makes a contribution to

the current state-of-the-art in parallelization techniques.

9.2 Contributions

The focus of this thesis is how to facilitate reasoning about the inherent parallelism

in strong, statically-typed, imperative, object-oriented languages. The main idea is

an abstract and composable effect system that can be used to specify side-effects as

part of a method’s signature and to then use this effect information to detect data

dependencies. The major contributions of this thesis are:

• The design of the Zal language — a novel language combining features from Own-

ership Types and adapting them to facilitate capturing, abstracting, and validat-

ing side-effects in a real language on real programs to facilitate the detection of


• Developing and proving sufficient conditions for parallelism in terms of ownership

effects and relationships for a number of different parallelism patterns.

• The design and implementation of a runtime ownership system — complements

static reasoning about ownership context relationships to facilitate conditional

parallelization of code blocks.

• Empirical evaluation of the systems designed to demonstrate the practicality of

the approach proposed.

This thesis has also made a number of smaller, more technical contributions which

have arisen during the validation of my ideas on realistic applications. I have applied

ideas from Ownership Types to the full C] language including handling static fields

and methods, indexers, properties, delegates, and user-defined value types; language

features not previously discussed in the ownership literature. As part of the process

Chapter 9. Conclusion & Future Work 247

of making and validating these contributions, I have created a compiler for the Zal

language which is a version of C] version 3.0 extended with my ideas. In building

this compiler, I added support for arbitrary type parameters to the GPC] research

compiler [32]. This compiler and its type parameter infrastructure are available from

the MQUTeR website [32]. Finally, I have demonstrated, though the application of my

ideas to representative sample applications, that my proposed approach to reasoning

about side-effects and parallelism is feasible.

9.3 Conclusions

In this thesis I have developed a novel framework which can be used to discover inherent

parallelism in programs written using modern, imperative, object-oriented languages. I

have demonstrated the feasibility of my proposed techniques through their application

to the C] programming language and a number of realistic sample applications. This

work represents a great first step on the road to reasoning about side-effects and data

dependencies in an abstract and composable manner. Do I claim that these techniques

are for practical applications? No, after analyzing the validation results I have identified

three categories of potential barriers to practical application:

• annotation overhead — as with other kinds of type annotations, Ownership

Types can be criticized for increasing the amount of code a developer needs to

write and complicating the construction of valid types in programs [1, 85];

• loop parallelism limitations — there are a number of syntactic and semantic

restrictions on the loops which my system can parallelize; and

• language limitations — the language model restricts the syntax and semantics

allowed in the base programming language extended with ownership annotations.

There are ways of addressing each of these categories of potential barriers. In the rest

of this section I elaborate on some of the possible avenues for addressing these potential

barriers.


9.3.1 Memory Overhead

The memory overhead measured for the runtime ownership system is non-trivial. It

would be interesting to explore the sources of this overhead more closely. It may be

possible to significantly reduce the overhead by modifying the Common Language Run-

time (CLR)to help facilitate this runtime ownership tracking. It may also be possible

to further optimize the packing of the data structures to minimize memory use once

the runtime ownership tracking fields are added to objects.

9.3.2 Annotation Overhead

Like any type annotation system, Ownership Types can be criticized for increasing the

amount of code a developer needs to write and complicating the construction of valid

types in programs [1, 85]. The cost of adding type annotations needs to be balanced by

the benefits obtained from doing so. Further reducing the annotation overhead would

contribute significantly towards making my proposals practical for everyday program-

ming tasks. In this section I will discuss ownership inference, ownership transfer, and

temporary owners for transient objects as avenues for reducing the annotation burden.

9.3.2.1 Ownership Inference

At present the type system used to provide the framework for localized reasoning about

program properties requires all types to be parameterized with context information.

This imposes a significant burden on the programmer since all types in an application

require annotation with ownership information. One way to relieve at least some of

this burden would be to build an ownership inference system. Ownership inference is

currently an active field of research in the Ownership Types community [85, 84, 1] and

it promises to be an interesting avenue for future work.

9.3.2.2 Ownership Transfer

With the type system currently included in my system, there are some programs which

cannot be properly annotated with ownership parameters. For example, annotating


a program which implements an abstract factory pattern would not be possible. In

2006, Nageli published a master’s thesis which studied how many of the Gang of Four

design patterns the then-current state-of-the-art ownership systems could implement.

He proposed a number of additional language features which could greatly increase the

number of design patterns that these systems could handle and central to these exten-

sions was the notion of ownership transfer. Ownership transfer would allow an object’s

owner to be changed after declaration and could be used to implement a number of

different patterns such as the Factory. Some work has been done on ownership trans-

fer in Universe Type systems by Wrigstad [127] as well as by Muller and Rudich [89].

Quite how these ideas could be retrofitted on to Zal remains to be seen, but would be

an interesting avenue of future study to pursue and would allow more programs to be

successfully annotated using my framework.

9.3.2.3 Temporary Owners for Transient Objects

Programs written in modern imperative object-oriented languages sometimes create

transient objects as part of a computation. These transient objects are not stored

in any fields and do not outlive the scope in which they are created. They are used

for a number of purposes from marshalling data to converting data types. Currently

Zal, as with most other ownerships systems, lacks a special context which can be

used to own these temporary objects. They must be assigned a regular owner as is

any other object. This can cause an increase in the number of false-positive data

dependencies reported when testing effect sets for disjointness. Further, the lack of a

special owner does not capture the fact that a given object is expected to be transient

by the programmer. Capturing this information would allow the compiler to detect

if the programmer violates their assumptions about the use of such objects. This, in

turn, may help detect bugs during compilation. How to add such a transient owner

is not currently clear. It may be possible to adapt some of the ideas presented in

“Existential Owners for Ownership Types” to this purpose [128]. Ensuring the safety

and correctness of Zal while preserving the ability to reason about context relationships

are both areas for future study.


9.3.3 Loop Parallelism Limitations

There are a number of syntactic and semantic restrictions on the loops which my sys-

tem can parallelize. Firstly, there is a need to look beyond data parallel loops. All

of the sufficient conditions for loop parallelism in the framework apply only to data

parallel loops. Secondly, the handling of the element uniqueness sufficient condition in

my framework at present is not ideal. Finally, only foreach and enhanced foreach

loops are handled by the framework I have proposed. In this section, I present ideas

for possible future avenues of research which may help to address some of these short-

comings.

9.3.3.1 Improved Handling of Collections

Consider the simplest expression for reading an element from an array which has the

form a[i]. When computing the read and write effects, there would be a read of the

owner of the array a and that effect would not change with the value of i supplied.

There are no means of describing effects at the level of individual elements of a col-

lection. Adding this support could also provide improved support for enforcing the

collection uniqueness requirement and so validating this programmer assertion.

One possible avenue for addressing this problem would be to parameterize contexts in

effect sets with values in a manner similar to dependent types. This would allow effects

on collections to be parameterized with a value to describe which parts of the collection

are being modified.

Another possible avenue would be to adapt some of the ideas from Deterministic Parallel

Java (DPJ) for use in Zal [18]. This would avoid the complications associated with

dependent types. How to facilitate effect composition using DPJ style effect notations

is an open question which would have to be answered.

9.3.3.2 Light’s Associativity Test

In Chapter 2, I introduced loop parallelism by stating that there were three basic

classes of loop operations: map, reduce, and filter. The sufficient conditions presented

and proved in this thesis have focused almost exclusively on mapping operations — the


major source of parallelism in imperative programs. There are, however, performance

gains to be made through the parallelization of other loop patterns as well, although the

gains may be smaller than the gains obtained by parallelizing the execution of mapping

loops.

Of particular interest is the reduction pattern where the loop operation takes two ele-

ments of the collection and produces a single value. When such an operation is applied

across an entire collection, it reduces the collection to a single value. Examples of such

an operation might be the summation of a collection of integers or the concatenation

of a collection of strings. To parallelize such an operation, the reduction order needs to

be changed. Unfortunately, not all reduction operations will produce the same result

if the order of application to the collection is modified.

Mathematicians call operations associative when their order of application can be

changed without affecting the result computed, provided the operands remain in the

same order. Proving that an operation is associative is a non-trivial process in the

general case and is a problem studied in some detail by mathematicians. One asso-

ciativity test for an operation documented in the mathematical literature is Light’s

Associativity Test [29] later improved by Bednarek [14] and Kalman [66]. This test

allows an operator to be proved to be associative over a fixed domain in a mechani-

cal manner. The test requires the operation being tested to be applied to a number

of pairs of elements in the domain being tested and the results compared. With the

addition of an effects system to a programming language, it would become possible to

determine when such a test could be safely performed without modifying the state of a

program. Such a test could be used to dynamically determine if a reduction operation

is associative and so the loops employing it amenable to parallelization. There would

be serious overheads involved in running such a test and determining when and if it

is worth testing an operation would require additional study. Once implemented, a

functioning runtime Light’s Associativity Test could be used in several different ways

including automated parallelization of reduction loops in specific circumstances and

verification of programmer supplied annotations asserting operation associativity.


9.3.4 Language Limitations

Finally, the language model restricts the syntax and semantics allowed in the base

programming language extended with ownership annotations. The system presented in

this thesis does not permit unrestricted references into the program stack nor does it

support explicit pointer types. To increase the number and style of languages to which

my techniques can be applied, it would be desirable to add support for these language

features. While this work would not make the system more practical for use in one of

the languages already considered, such as Java or C], it would allow the techniques to

be applied to other less structured languages.

9.3.4.1 Liberalization of the Stack Model

The stack model used to develop the system for reasoning about side-effects, depen-

dencies and inherent parallelism in this thesis was restricted. I required the heap to be

orthogonal to the stack and access into the stack to be tightly controlled. While Java

and the safe subset of the C] language fit such a model, there are many other popular

imperative object-oriented languages which do not; for example, C++ and the unsafe

subset of C].

There are a number of possible approaches that could be taken to liberalize the stack

model. One option would be to extend ownerships to include both the stack and the

heap. One approach to doing this would be to have stack contexts which correspond

to the scopes of stack variables with the hierarchy of the contexts corresponding to the

scope nesting hierarchy. References into the stack would need to be annotated with

context parameters corresponding to the stack scope being referenced. Precisely how

such a system would operate requires additional study, but offers hope that the system

can be modified to operate on languages employing more liberal stack models.

9.3.4.2 Handling Unsafe Code Blocks

In implementing Zal, I consciously focused only on the safe subset of the C] language

— code which does not make use of pointers. It would be interesting to explore how

to extend Zal to also support unsafe code blocks. The unsafe language subset includes


pointers and it operates on a weaker stack model where references to arbitrary stack

locations can be taken and so such an extension would likely be non-trivial.

It would be interesting to explore if a notation similar to that used in Cyclone [117]

could be used to annotate pointer data types with information about what areas of

the stack and heap they can refer to. The ownership parameters on a pointer type

could be used to determine what areas of memory are being read when a de-referencing

operation is performed, not unlike the reading of fields from reference types in Zal.

9.3.4.3 Multiple Ownerships to Model Communication Channels

One of the latest recent developments in the field of Ownership Types is type systems

which permit objects to have multiple owners [25]. These type systems can provide

greater flexibility than single ownership systems. This means that they can handle

patterns not easily expressed in single ownership languages. One of the interesting

questions is what multiple ownership means or represents in a system, like that proposed

in this thesis, which uses contexts to reason about side-effects and dependencies.

In my opinion, one possible interpretation is that they represent communication chan-

nels between objects in two or more different contexts; the end points of the channel

being the context’s owners. Such a communication channel could be used in many

ways. One example would be for message passing style communication between differ-

ent threads of execution in parallel program. Figure 9.1 shows two objects with single

owners (labelled a and b) communicating via an object jointly owned by a and b.

Figure 9.1: Illustration of two singly owned objects a and b communicating via a jointlyowned context, labelled a&b.


9.4 Summation

In this thesis I have presented a system which combines abstraction and composition

with parallelization and correctness checking to produce a novel framework for reason-

ing about inherent parallelism in imperative object-oriented programs. This framework

provides a new way of looking at data dependency analysis, through the use of side-

effects rather than pairwise comparison of statements. This helps both programmers

and automated tools reason about inherent parallelism. The key idea behind this frame-

work and this new perspective is the extension of the programming language to create

a scaffold for localized reasoning about program properties.

I have demonstrated through my proof-of-concept implementation that these ideas are

feasible. Through reflection on the results of my system validation, I believe that

there are three main barriers to the use of my ideas in a practical system: annotation

overhead, loop parallelism limitations, and language limitations. All of these barriers

relate to a cost-benefit analysis: is the cost of annotating the program with additional

information worth the benefits obtained from the parallelization. Earlier in this chapter,

I proposed a number of ideas which could be explored to help reduce the annotation

cost as well as increase the amount of paralellization benefit. However, the language

supported framework for localized reasoning about program properties is not limited to

applications in the domain of parallelism. A number of techniques in computer science

could benefit from this approach including program optimization, program verification,

and memory management. With additional work, the cost of annotating a program may

be reduced while the benefits, obtained from the simplified reasoning about program

invariants and behavior, increase. The net result would make the benefit outweigh the

cost, thus producing a system practical for use in everyday computing.

Bibliography

[1] Marwan Abi-Antoun and Jonathan Aldrich. Compile-time views of execu-

tion structure based on ownership. In Proceedings of the International Work-

shop on Aliasing, Confinement, and Ownership in Object-oriented Programming

(IWACO) 2007, pages 93–104, 2007.

[2] G. Agha. Actors: A Model of Concurrent Computation in Distributed Systems.

PhD thesis, Massachusetts Institute of Technology, 1985.

[3] Johnathan Aldrich and Craig Chambers. Ownership domains: Separating aliasing

policy from mechanism. In ECOOP 2004 — Object-oriented Programming, pages

1–25, 2004.

[4] V. H. Allan, R. B. Jones, R. M. Lee, and S. J. Allan. Software pipelining. ACM

Computing Survey, 27(3):367–432, 1995.

[5] P. S. Almeida. Balloon types: Controlling sharing of state in data types. In 11th

European Conference on Object-Oriented Programming, pages 48–64, 1997.

[6] L. O. Andersen. Program and Specialization for the C Programming Language.

Phd thesisk, University of Copenhagen, Denmark, DIKU Rep. 94/19 1994.

[7] J. Armstrong. The development of erlang. In Proceedings of the second ACM

SIGPLAN International Conference on Functional Programming, pages 196–203,

New York, NY, USA, 1997. ACM.

[8] Pedro V. Artigas, Manish Gupta, Samuel P. Midkiff, and Jose E. Moreira. Au-

tomatic loop transformations and parallelization for Java. In Proceedings of the

14th international conference on Supercomputing, pages 1–10, New York, NY,

USA, 2000. ACM.

255

256 BIBLIOGRAPHY

[9] Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis,

Parry Husbands, Kurt Keutzer, David A. Patterson, William Lester Plishker,

John Shalf, Samuel Webb Williams, and Katherine A. Yelick. The land-

scape of parallel computing research: A view from berkeley. Technical Report

UCB/EECS-2006-183, Electrical Engineering and Computer Sciences, University

of California at Berkeley, December 2006.

[10] U. Banerjee, R. Eigenmann, A. Nicolau, and D.A. Padua. Automatic program

parallelization. Proceedings of the IEEE, 81(2):211 –243, feb 1993.

[11] Utpal K. Banerjee. Dependence Analysis for Supercomputing. Kluwer Academic

Publishers, 1988.

[12] Gilad Barcha. Pluggable type systems. Proceedings of the OOPSLA04 Workshop

on Revival of Dynamic Languages, 2004.

[13] Mike Barnett, K. Rustan M. Leino, and Wolfram Schulte. The spec] programming

system: An overview. In Proceddings of the Construction and Analysis of Safe,

Secure, and Interoperable Smart Devices International Workshop, pages 49–69,

2004.

[14] A. R. Bednarek. An extension of light’s associativity test. The American Math-

ematical Monthly, 75(5):531–532, May 1968.

[15] A. J. Bernstein. Analysis of programs for parallel processing. IEEE Trransactions

on Electronic Computers, EC-15(5):757–763, October 1966.

[16] Graham M. Birtwistle, Ole-Johan Dahl, Bjørn Myhrhaug, and Kirsten Nygaard.

SIMULA Begin. Auerback Publishers Inc., Philadelphia, PA, 1973.

[17] W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, and T. Lawrence.

Parallel programming with Polaris. Computer, 29(12):78 –82, dec 1996.

[18] Robert L. Bocchino, Jr., Vikram S. Adve, Danny Dig, Sarita V. Adve, Stephen

Heumann, Rakesh Komuravelli, Jeffrey Overbey, Patrick Simmons, Hyojin Sung,

and Mohsen Vakilian. A type and effect system for deterministic parallel java.

In OOPSLA ’09: Proceeding of the 24th ACM SIGPLAN conference on Object

BIBLIOGRAPHY 257

oriented programming systems languages and applications, pages 97–116, New

York, NY, USA, 2009. ACM.

[19] Chandrasekhar Boyapati. SafeJava: A Unified Type System for Safe Program-

ming. PhD thesis, Massachusetts Institute of Technology, 2004.

[20] Chandrasekhar Boyapati, Robert Lee, and Martin Rinard. Ownership types for

safe programming: Preventing data races and deadlocks. In OOPSLA ’02: Pro-

ceedings of the 17th ACM SIGPLAN conference on Object-oriented programming,

systems, languages, and applications, pages 211–230, New York, NY, USA, 2002.

ACM.

[21] G. Bracha and D. Ungar. Mirrors: Design principles for meta-level facilities in

object-oriented programming languages. In 19th annual ACM SIGPLAN con-

ference on Object-Oriented Programming, Systems, Languages, and Applications,

pages 331–344. ACM Press, 2004.

[22] Martin Bravenboer and Yannis Smaragdakis. Strictly declarative specification of

sophisticated points-to analyses. SIGPLAN Not., 44(10):243–262, 2009.

[23] B. Bruegge and A. H. Dutoit. Object-Oriented Software Engineering Using UML.

Pearson Education, Upper Saddle River, NJ, 2 edition, 2004.

[24] Nicholas Cameron. Existential Types for Variance - Java Wildcards and Owner-

ship Types. PhD thesis, Imerial College London, 2008.

[25] Nicholas Cameron, Sophia Drossopoulou, James Noble, and Matthew Smith.

Multiple ownership. In Proceedings of the 2007 OOPSLA conference, volume 42,

pages 441–460. ACM, 2007.

[26] Dave Clarke, Tobias Wrigstad, Johan Ostlund, and Einar Broch Johnsen. Min-

imal ownership for active objects. In Proceedings of 6th Asisan Symposium on

Programming Languages and Systems, pages 139–154. Springer-Verlag Berlin Hei-

delberg, 2008.

258 BIBLIOGRAPHY

[27] David Clarke and Sophia Drossopoulou. Ownership encapsualtion and the dis-

jointness of type and effect. In OOPSLA ’02: Proceedings of the 17th ACM SIG-

PLAN Conference on Object-Oriented Programming, Systems, Languages and

Applications, pages 292–310, New York, NY, 2002. ACM Press.

[28] David G. Clarke, John M. Potter, and James Noble. Ownership types for flexible

alias protection. In 13th ACM SIGPLAN conference on Object-Oriented Program-

ming, Systems, Languages and Applications, pages 48–64. ACM Press, October

18-22 1998.

[29] A. H. Clifford and G. B. Preston. The Algebraic Theory of Semigroups, vol-

ume 1 of Mathematical Surveys and Monographs. American Mathematical Scoci-

ety, Providence, RI, 1961.

[30] N. H. Cohen. Type-extension type test can be performed in constant time. ACM

Transactions on Programming Languages and Systems, 13(4):626–629, October

1991.

[31] A Craik and W Kelly. Using ownership to reason about inherent parallelism in

object-oriented programs. International Conference on Compiler Construction,

2010.

[32] Andrew Craik and Wayne Kelly. Mquter parallelism research.

http://www.mquter.qut.edu.au/par/, 2009.

[33] Cray Inc. Chapel Language Specification 0.795. Cray Inc., Seattle, WA, USA,

April 2010.

[34] D. Cunningham, S. Drossopoulou, and S. Eisenbach. Universe types for race

safety. In Verifcation and Analysis of Multi-threaded Java-like Programs, pages

20–51, 2007.

[35] Defence Advanced Projects Agency. High Productivity Computing Systems

(HPCS) Program Plan. Defence Advanced Projects Agency, 2006.

[36] Peter J. Denning and Jack B. Dennis. The resurgence of parallelism. Communi-

cations of the ACM, 53(6):30–32, June 2010.

BIBLIOGRAPHY 259

[37] Jayant DeSouza and Laxmikant V. Kale. Jade: A parallel message-driven java.

In G. Goos, J Hartmanis, and J. van Leeuwen, editors, Computational Science —

International Conference on Computer Science 2003, pages 760–769. Springer-

Verlag Berlin Heidelberg, 2003.

[38] Werner Micheal Dietl. Universe Types — Topology, Encapsualtion, Genericity,

and Tools. PhD thesis, Swiss Federal Institute of Technology Zurich, 2009.

[39] E. W. Dijkstra. Recursive programming. Numerische Mathematik, 2(1):312–318,

Decembre 1960.

[40] Edsger Dijkstra. Go to statement considered harmful. Communications of the

ACM, 11(3):147–148, 1968.

[41] S. Drossopoulou, D. Clarke, and J. Noble. Types for hierarchic shapes (summary).

In P. Sestof, editor, European Symposium on Object-Oriented Programming 2006,

volume LNCS 3924, pages 1–6. Springer-Verlag Berlin Heidelberg, 2006.

[42] Nicu G. Fruja. Towards proving type safety of c]. Computer Languages, Systems

& Structures, 36:60–95, 2010.

[43] Michael Furr, Jong-hoon (David) An, Jeffrey S. Foster, and Michael Hicks. Static

type inference for ruby. In SAC ’09: Proceedings of the 2009 ACM symposium

on Applied Computing, pages 859–1866, New York, NY, USA, 2009. ACM.

[44] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: Elements of

Reusable Object-Oriented Programming Languages. Addison-Wesley Professional,

1995.

[45] D. K. Gifford, P. Joubelot, A. Sheldon, and J. W. O’Toole. Report on the

FX programming language. Technical Report MIT/LCS/TR-531, Massachusetts

Institute of Technology, Cambridge, MA, USA, 1992.

[46] Gina Goff, Ken Kennedy, and Chau-Wen Tseng. Practical dependence testing.

ACM SIGPLAN Notices, 26(6):15–29, 1991.

[47] Adele Goldberg and David Robson. Smalltalk-80: The Language. Addison-Wesley

Series in Computer Science. Addison-Wesley, Reading, MA, 1989.

260 BIBLIOGRAPHY

[48] K John Gough. The gplex scanner generator. Technical report, Queensland

University of Technology, December 2009.

[49] A. Greenhouse and J. Boyland. An object-oriented effects system. In Proceedings

of the European Conference on Object-Oriented Programming ’99, 1999.

[50] D. Grossman, D. Morrisett, T. Jim, M. Hicks, Y. Wang, and J. Cheney. Region-

based memory management in cyclone. In 29th ACM SIGPLAN conference on

Programming Language Design and Implementation, pages 282–293, Berlin, Ger-

many, 2002. ACM.

[51] Mary W. Hall, Saman P. Amarasinghe, Brian R. Murphy, Shih-Wei Liao, and

Monica S. Lam. Interprocedural parallelization analysis in suif. ACM Transac-

tions on Programming Languages and Systems, 27(4):662–731, July 2005.

[52] P. Haller and M. Odersky. Event-based programming without inversion of control.

In 7th Joint Modular Languages Conference (JMCL), volume LNCS 4228, pages

4–22. Springer-Verlag Berlin Heidelberg, 2006.

[53] H. R. Halstead. Multilisp: A language for concurrent symoblic computation.

ACM Transactions on Programming Languages and Systems (TOPLAS), 7:501–

538, 1985.

[54] Jonathan Harris, John A. Bircsak, M. Regina Bolduc, Jill Ann Diewald, Israel

Gale, Neil W. Johnson, Shin Lee, C. Alexander Nelson, and Carl D. Offner.

Compiling high performance fortran for distributed-memory systems. Digital

Tech. J., 7(3):5–23, January 1995.

[55] C. Hewitt, P. Bishop, and R. Steiger. A universal modular actor formalism for

artifical intelligence. In 3rd International Joint Conference on Artificial Intelli-

gence, pages 235–245, 1973.

[56] C. A. R. Hoare. An axiomatic basis for computer programming. Communications

of the ACM, 12(10):576–580, 1969.

[57] John Hogg. Islands: Aliasing protection in object-oriented languages. In OOP-

SLA ’91: Conference proceedings on Object-oriented programming systems, lan-

guages, and applications, pages 271–285, New York, NY, USA, 1991. ACM.

BIBLIOGRAPHY 261

[58] Susan Horwitz. Preceise flow-insensitive may-alias analysis is np-hard. In ACM

Transactions on Programming Languages and Systems, volume 19, pages 1–6.

ACM, 1997.

[59] P. Hudak, J. Hughes, S. P. Jones, and P. Walder. A history of haskell: Being lazy

with class. In 3rd ACM SIGPLAN conference on the History of Programming

Languages, pages 12–1—12–55, San Deigo, CA, 2007. ACM.

[60] Atsushi Igarashi, Benjamin C. Pierce, and Philip Wadler. Featherweight java:

A minimal core calculus for java and gj. ACM Transactions on Programming

Languages and Systems, 23(3):396–450, May 2001.

[61] S. Ishtiaq and P. W. O’Hearn. Bi as an assertion language for mutable data struc-

tures. In Conference Record of POPL 2001: The 28th SIGPLAN-SIGACT Sym-

posium on Principles of Programming Languages, New York, NY, 2001. ACM.

[62] Simon Jensen, Anders Mller, and Peter Thiemann. Type analysis for javascript. In

Jens Palsberg and Zhendong Su, editors, Static Analysis, volume 5673 of Lecture

Notes in Computer Science, pages 238–255. Springer Berlin / Heidelberg, 2009.

[63] T. Jim, G. Morrisett, D. Grossman, M. W. Hicks, J. Cheney, and Y. Wang.

Cyclone: A safe dialect of c. In General Track: 2002 USENIX Annual Technical

Conference, pages 275–288, Monterey, CA, USA, 2002.

[64] S. P. Jones. Engineering Theories of Software Construction, chapter Tackling the

Awkward Squad: Monadic Input/Output Concurrency, Exceptions, and Foreign-

Language Calls in Haskell, pages 47–96. IOS Press, 2001.

[65] H. V. Jula. Asm semantics for c] 2.0. In ASM’05: Proceedings of the 12th

workshop on abstract state machines, Paris, France, 2005.

[66] J. A. Kalman. Bednarsk’s extension of light’s associativity test. Semigroup Fo-

rum, 3:275–276, 1971.

[67] A. C. Kay. The early history of Smalltalk. In 2nd ACM SIGPLAN Conference

on the History of Programming Languages, pages 69–95, Cambridge, MA, USA,

1993. ACM.

262 BIBLIOGRAPHY

[68] W. Kelly. Optimization within a Unified Transformation Framework. PhD thesis,

Faculty of Graduate Studies, University of Maryland, College Park, MD, 1996.

[69] Neel Krishnaswami and Jonathan Aldrich. Permission-based ownership: encap-

sulating state in higher-order typed languages. In PLDI ’05: Proceedings of the

2005 ACM SIGPLAN conference on Programming language design and imple-

mentation, pages 96–106, New York, NY, USA, 2005. ACM.

[70] Kazuhiro Kusano and Mitsuhisa Sato. A comparison of automatic paralleliz-

ing compiler and improvements by compiler directives. In Constantine Poly-

chronopoulos, Kazuki Fukuda, and Shinji Tomita, editors, High Performance

Computing, volume 1615 of Lecture Notes in Computer Science, pages 95–108.

Springer Berlin / Heidelberg, 1999.

[71] Peeter Laud, Tarmo Uustalu, and Varmo Vene. Type systems equivalent to

data-flow analyses for imperative languages. Theoretical Computer Science,

364(3):292–310, November 2006.

[72] Shih-Wei Liao, Amer Diwan, Robert P. Bosch, Jr., Anwar Ghuloum, and Mon-

ica S. Lam. SUIF Explorer: an interactive and interprocedural parallelizer. In

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice

of parallel programming, pages 37–48, New York, NY, USA, 1999. ACM.

[73] Yi Lu and John Potter. Protecting represeentation with effect encapsualtion. In

POPL ’06: Conference Record of the 33rd ACM SIGPLAN-SIGACT Symposium

on Principles of Programming Languages, pages 359–371, New York, NY, 2006.

ACM Press.

[74] Yi Lu, John Potter, and Jingling Xue. Validity invariants and effects. In ECOOP

2007 — Object-Oriented Programming, pages 202–226. Springer-Verlag Berlin

Heidelberg, August 2007.

[75] Dror E. Maydan, John L. Hennessy, and Monica S. Lam. Efficient and exact data

dependence analysis. SIGPLAN Not., 26(6):1–14, 1991.

[76] T. J. McCabe. A complexity measure. IEEE Transactions on Software Engineer-

ing, 2(4):3–15, 1976.

BIBLIOGRAPHY 263

[77] Message Passing Interface Forum. MPI-2: Extensions to the Message Passing

Interface. University of Tennessee, Knoxville, TN, USA, 1997.

[78] Microsoft Corporation. C] Language Specification Version 3.0. Microsoft Corpo-

ration, 2007.

[79] Microsoft Corporation. Introduction to PLINQ. http://msdn.microsoft.com/

en-us/library/dd997425.aspx, April 2010.

[80] Microsoft Corporation. .NET Framework 4 — Task Parallel Library. http:

//msdn.microsoft.com/en-us/library/dd460717.aspx, April 2010.

[81] Microsoft Corporation. Potential pitfalls with PLINQ. http://msdn.microsoft.

com/en-us/library/dd997403.aspx, April 2010.

[82] Microsoft Corporation. Samples for parallel programming with the .net frame-

work 4. http://code.msdn.microsoft.com/ParExtSamples, May 2010.

[83] A. Milanova. Static inference of universe types. In International Wokr-

shop on Aliasing, Confinement and Ownership in Object-Oriented Programming

(IWACO), 2008.

[84] Ana Milanova and Yin Liu. Practical static ownership inference. Technical Report

RPI/DCS-09-04, Rensselaer Polytechnic Institute, Sept. 2009.

[85] Ana Milanova and Yin Liu. Static ownership inference for reasoning against

concurrency errors. In International Conference on Software Engineering 2009,

pages 279–282. IEEE, 2009.

[86] Ana Milanova, Atanas Rountev, and Barbara G. Ryder. Parameterized object

sensitivity for points-to analysis for java. ACM Transactions on Software Engi-

neering and Methodology, 14(1):1–41, January 2005.

[87] Gordon E. Moore. Cramming more components onto integrated circuits. Elec-

tronics, 38(8):4–7, April 1965.

[88] Peter Muller and A. Poetzsch-Heffter. A type system for controlling representa-

tion exposure in java. Technical Report 269, Fernuniversitat Hagen, 2000.

http://msdn.microsoft.com/en-us/library/dd997425.aspx






http://code.msdn.microsoft.com/ParExtSamples

264 BIBLIOGRAPHY

[89] Peter Muller and A. Rudich. Ownership transfer in universe types. In Object-

Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages

461–478. ACM Press, 2007.

[90] Stefan Nageli. Ownership in design patterns. Master’s thesis, Software Technol-

ogy Group, Department of Computer Science, ETH Zurich, 2006.

[91] James Noble, Jan Vitek, and John Potter. Flexible alias protection. In Proceedings

of the European Conference on Object-Oriented Programming 1998, volume LNCS

1445, pages 158–185. SpringerVerlag Berlin Heidleburg, 1998.

[92] Kirsten Nygaard and Ole-Johan Dahl. The development of the simual languages.

In Richard L. Wexelblat, editor, Proceedings of the ACM SIGPLAN History of

Programming Languages Conference, ACM Monograph Series, pages 439–493,

Los Angeles CA, 1978.

[93] M. Odersky, P. Altherr, V. Cremet, I. Dragos, G. Dubochet, B. Emir,

S. McDirmid, Stephane Micheloud, N. Mihaylov, M. Schinz, E. Stenman,

L. Spoon, and M. Zenger. An overview of the scala programming language

2nd edition. Technical Report LAMP-REPORT-2006-001, Ecole Polytechnique

Federal de Lausanne, Lausanne, Switzerland, 2006.

[94] OpenMP Architecture Review Board. OpenMP Application Programmer Inter-

face version 2.5. OpenMP Architecture Review Board, 2005.

[95] Alex Potanin. Generic Ownership — A Practical Approach to Ownership and

Confinement in OO Programming Langauges. PhD thesis, School of Engineering

and Computer Science, Victoria University of Wellington, 2007.

[96] W. Pugh. Skip lists: A probabilistic alternative to balanced trees. Communica-

tions of the ACM, 33(6):668–676, June 1990.

[97] W. Pugh. The Omega Test: A fast and practical integer programming algorithm

for dependence analysis. In 1991 ACM/IEEE Conference on Supercomputing,

pages 4–13. ACM, 1991.

BIBLIOGRAPHY 265

[98] William Pugh and David Wonnacott. Eliminating false data dependences using

the Omega Test. In PLDI ’92: Proceedings of the ACM SIGPLAN 1992 confer-

ence on Programming language design and implementation, pages 140–151, New

York, NY, USA, 1992. ACM.

[99] Mohammad Raza, Cristiano Calcagno, and Philippa Gardner. Automatic paral-

lelization with separation logic. In 18th European Symposium on Programming,

ESOP 2009, volume LNCS 5502/2009, pages 348–362, 2009.

[100] W Reid, W Kelly, and A Craik. Reasoning about data parallelism in mod-

ern object-oriented languages. Australasian Computer Science Conference 2008,

2008.

[101] J. C. Reynolds. Intuitionistic reasoning about shared mutable data structures.

In J. Davies, B. Roscoe, and J. Woodcock, editors, Millennial Perspectives in

Computer Science, pages 202–321. Palgrave, Houndsmill, Hampshire, 2000.

[102] John C. Reynolds. Separation logic: A logic for shared mutable data structures. In

Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science

(LICS’02), pages 55–74, 2002.

[103] Vijay Saraswat. Report on the Programming Language X10 - Version 2.0.3. IBM,

April 2010.

[104] J. Schafer and A. Poetzsch-Heffter. A parameterized type system for simple loose

ownership domains. Journal of Object Technology, 6(5):71–100, 2007.

[105] J. Schafer and A. Poetzsch-Heffter. Coboxes: Unifying active objects and struc-

tured heaps. Formal Methods for Open Object-Based Distributed Systems, 5051

of LNCS:201–219, 2008.

[106] J. Schafer, M. Reitz, J.-M. Gaillourdet, and A. Poetzsch-Heffter. Linking pro-

grams to architectures: An object-oriented hierarchical software model based on

boxes. The Common Component Model Example, 5153 of LNCS:238–266, 2008.

[107] Micheal L. Scott. Programming Language Pragmatics. Acadmic Press, first edi-

tion edition, 2000.

266 BIBLIOGRAPHY

[108] Matthew Smith. A Model of Effects with an Application to Ownership Types.

PhD thesis, Imperial College London, May 2007.

[109] A.C. Sodan, J. Machina, A. Deshmeh, K. Macnaughton, and B. Esbaugh. Par-

allelism via multithreaded and multicore cpus. Computer, 43(3):24–32, March

2010.

[110] Sriram Srinivasan and Alan Mycroft. Kilim: Isolation-typed actors for Java. In

European Conference on Object-Oriented Programming 2008, volume LNCS 5142,

pages 104–128. Springer-Verlag Berlin Heidelberg, 2008.

[111] B. Steensgaard. Points-to analysis in almost linear time. In ACM Symposium on

Principles of Programming Languages, pages 32–41, New York, 1996. ACM.

[112] B. Stroustrup. A history of c++: 1979-1991. In Thomas J. Bergin, Jr. and

Richard G. Gibson, Jr., editors, Proceedings of the ACM HIstory of Programming

Languages conference (HOPL-2), pages 699–769, New York, NY, USA, 1993.

ACM.

[113] Sun Microsystems. JDK 1.1.1 signing flaw. March 1997.

[114] Sun Microsystems. Jsr 14: Add generic types to the javatm programming lan-

guage. Technical report, Sun Microsystems, 2004.

[115] Sun Microsystems Research. The Fortress Language Specification Version 1.0

Beta. Sun Microsystems, Melano Park, CA, USA, 2007.

[116] Herb Sutter. A fundamental turn toward concurrency in software. Dr. Dobb’s

Journal, 30(3), March 2005.

[117] Nikhil Swamy, Micheal Hicks, Dan Morrisett, Dan Grossman, and Trevor Jim.

Safe manual memory management in cyclone. Science of Computer Programming,

62(2):122–144, October 2006.

[118] Peter Thiemann. Towards a type system for analyzing JavaScript programs. In

14th European Symposium on Programming, volume LNCS 3444, pages 408–422.

Springer-Verlag Berlin Heidelberg, 2005.

BIBLIOGRAPHY 267

[119] Laurence Tratt and Roel Wuyts. Guest editors’ introduction: Dynamically typed

languages. IEEE Software, 24:28–30, 2007.

[120] University of St. Petersburg. Parallelism dwarfs project. http://

paralleldwarfs.codeplex.com/, April 2009.

[121] P. Wegner. Concepts and paradigms of object-oriented programming. ACM

SIGPLAN OOPS Messenger, 1:7–87, 1990.

[122] Adam Welc, Suresh Jagannathan, and Antony Hosking. Safe futures for Java.

In OOPSLA ’05: Proceedings of the 20th annual ACM SIGPLAN conference on

Object-oriented programming, systems, languages, and applications, pages 439–

453, New York, NY, USA, 2005. ACM.

[123] Darren Willis, David J. Pearce, and James Noble. Caching and incrementalisation

for the java query language. In Proceedings of the ACM Conference on Object-

Oriented Programming Systems, Languages & Applications, pages 1–17. ACM

Press, 2008.

[124] N. Wirth. Type extensions. ACM Transactions on Programming Languages and

Systems, 10(2):204–214, 1988.

[125] A. Woß, M. Loberbauer, and H. Mossenbock. LL(1) conflict resolution in a

recursive descent compiler generator. In Joint Modular Languages Conference

(JMLC’03), Klagenfurt, 2003.

[126] Andrew K. Wright and Matthias Felleisen. A syntactic approach to type sound-

ness. Technical Report TR91-160, Rice University, Huston, TX 77251-1892, June

1992.

[127] T. Wrigstad. Ownership-Based Alias Management. Phd, Royal Institute of Tech-

nology Stockholm, 2006.

[128] Tobias Wrigstad and Dave Clarke. Existential owners for ownership types. Jour-

nal of Object Technology, 6(4):141–159, May-June 2007.

[129] Yoav Zibin. Featherweight Ownership and Immutability Generic Java (FOIGJ).

Technical Report ECSTR10-05, School of Engineering and Computer Science,

Victoria University of Wellington, March 2010.

http://paralleldwarfs.codeplex.com/

http://paralleldwarfs.codeplex.com/

A Framework for Reasoning about Inherent Parallelism in...

Documents

Transcript of A Framework for Reasoning about Inherent Parallelism in...