Encapsulation and Abstraction for Modeling and...

Encapsulation and Abstraction forModeling and VisualizingInformation Uncertainty

Alexander StreitBachelor of Information Technology (Honours)

Queensland University of Technology

A thesis submitted in partial fulfilment of the requirements for the degree ofDoctor of Philosophy

March 2008

Principal Supervisor: Prof. Binh PhamAssociate Supervisor: Dr. Ross Brown

Faculty of Information TechnologyQueensland University of TechnologyBrisbane, Queensland, AUSTRALIA

© Copyright by Alexander Streit 2008All Rights Reserved

ii

Dedication

For Jasper, Fral, and Jilli.

iii

Keywords

Information Uncertainty Visualization, Information Uncertainty Modeling, Spread-

sheets, Visualization Spreadsheets, Uncertainty Visualization Spreadsheets, Visualiza-

tion Tools, Modeling Tools, Uncertainty Modeling, Uncertainty Visualization, Proba-

bility, Fuzzy Visualization, Visualization Frameworks, Visualization

v

Abstract

Information uncertainty is inherent in many real-world problems and adds a layer of

complexity to modeling and visualization tasks. This often causes users to ignore

uncertainty, especially when it comes to visualization, thereby discarding valuable

knowledge. A coherent framework for the modeling and visualization of information

uncertainty is needed to address this issue

In this work, we have identified four major barriers to the uptake of uncertainty

modeling and visualization. Firstly, there are numerous uncertainty modeling tech-

niques and users are required to anticipate their uncertainty needs before building their

data model. Secondly, parameters of uncertainty tend to be treated at the same level

as variables making it easy to introduce avoidable errors. This causes the uncertainty

technique to dictate the structure of the data model. Thirdly, propagation of uncertainty

information must be manually managed. This requires user expertise, is error prone,

and can be tedious. Finally, uncertainty visualization techniques tend to be developed

for particular uncertainty types, making them largely incompatible with other forms

of uncertainty information. This narrows the choice of visualization techniques and

results in a tendency for ad hoc uncertainty visualization.

The aim of this thesis is to present an integrated information uncertainty modeling

and visualization environment that has the following main features: information and

its uncertainty are encapsulated into atomic variables, the propagation of uncertainty is

automated, and visual mappings are abstracted from the uncertainty information data

type.

Spreadsheets have previously been shown to be well suited as an approach to visu-

alization. In this thesis, we devise a new paradigm extending the traditional spreadsheet

to intrinsically support information uncertainty.

vii

Our approach is to design a framework that integrates uncertainty modeling tech-

niques into a hierarchical order based on levels of detail. The uncertainty information

is encapsulated and treated as a unit allowing users to think of their data model in terms

of the variables instead of the uncertainty details. The system is intrinsically aware of

the encapsulated uncertainty and is therefore able to automatically select appropriate

uncertainty propagation methods.

A user-objectives based approach to uncertainty visualization is developed to guide

the visual mapping of abstracted uncertainty information. Two main abstractions of

uncertainty information are explored for the purpose of visual mapping: the Unified

Uncertainty Model and the Dual Uncertainty Model. The Unified Uncertainty Model

provides a single view of uncertainty for visual mapping, whereas the Dual Uncertainty

Model distinguishes between possibilistic and probabilistic views. Such abstractions

provide a buffer between the visual mappings and the uncertainty type of the underly-

ing data, enabling the user to change the uncertainty detail without causing the visual-

ization to fail.

Two main case studies are presented. The first case study covers exploratory

and forecasting tasks in a business planning context. The second case study inves-

tigates sensitivity analysis for financial decision support. Two minor case studies are

also included: one to investigate the relevancy visualization objective applied to busi-

ness process specifications, and the second to explore the extensibility of the system

through General Purpose Graphics Processor Unit (GPGPU) use. A quantitative anal-

ysis compares our approach to traditional analytical and numerical spreadsheet-based

approaches. Two surveys were conducted to gain feedback on the from potential users.

The significance of this work is that we reduce barriers to uncertainty modeling

and visualization in three ways. Users do not need a mathematical understanding of

the uncertainty modeling technique to use it; uncertainty information is easily added,

changed, or removed at any stage of the process; and uncertainty visualizations can be

built independently of the uncertainty modeling technique.

viii

Publications

1. Pham, B. and Streit, A. and Brown, R. “Visualisation of Information Uncertainty: Progress and

Challenges,” in Interactive Visualisation: A State-of-the-Art Survey, Elena Zudilova-Seinstra,

Tony Adriaansen and Robert van Liere (eds.), 2007, Springer, UK. In Print.

2. Streit, A. and Pham, B. and Brown, R. “A Spreadsheet Approach to Facilitate Visualization of

Uncertainty in Information,” IEEE Transactions on Visualization and Computer Graphics, vol.

14, no. 1, pp. 61-72, Jan/Feb, 2008

3. Streit, A. and Pham, B. and Brown, R. Visualisation Support for Managing Large Business Pro-

cess Specifications. International Conference on Business Process Management (BPM). Nancy,

France, September 6-8, 2005. Lecture Notes in Computer Science, Springer. Acceptance rate:

13%

4. Campbell, A. and Berglund, E. and Streit, A. Graphics Hardware Implementation of the Parameter-

Less Self-Organising Map. International Conference on Intelligent Data Engineering and Au-

tomated Learning (IDEAL’05). Brisbane, July 6-8, 2005. Pages 343-350. Lecture Notes in

Computer Science, Springer.

ix

Acknowledgments

This thesis would not have been possible without my principal supervisor, Prof. Binh

Pham, and my associate supervisor, Dr. Ross Brown. Both collaborated to teach me

their process for completing research projects. Invaluable knowledge for which I am

very grateful.

I wish especially to thank Fral, who supported me even when it didn’t seem ratio-

nal to do so. My mother, Jilli, who should really be receiving this degree herself. I

also wish to thank my Honors supervisor, Ruth Christie, who inspired me to pursue

postgraduate studies in the first instance.

I wish to thank my colleague, Dr. Robert Smith, who provided me with extensive

insight and feedback. Alexander Campbell for his many comments and suggestions.

Finally, I wish to thank my business associate, Dr. Andy Boud, for acting as an unof-

ficial mentor.

xi

Abbreviations

ASP Analytical Spreadsheet Package

DUM Dual Uncertainty Model

EBNF Extended Backus-Naur Form

GIS Geographic Information Systems

GPGPU General Purpose Graphics Processing Unit

LIC Line Integral Convolution

NIST National Institute of Standards and Technology

PDF Probability Density Function

QUM Quad Uncertainty Model

SI The Spreadsheet for Images

SIV Spreadsheet for Information Visualization

UML Unified Modeling Language

UUM Unified Uncertainty Model

VTK The Visualization Toolkit

xiii

Contents

Abstract vii

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Original Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.5 Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.6 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . 5

2 Background 7

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Information Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Sources of Information Uncertainty . . . . . . . . . . . . . . 9

2.2.2 Understanding Information Uncertainty . . . . . . . . . . . . 10

2.2.3 Approaches to Modeling Information Uncertainty . . . . . . . 11

2.3 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3.1 The Sensemaking Process . . . . . . . . . . . . . . . . . . . 17

2.3.2 Visualization Techniques . . . . . . . . . . . . . . . . . . . . 20

2.4 Information Uncertainty Visualization Approaches . . . . . . . . . . 24

2.4.1 Low-level Features . . . . . . . . . . . . . . . . . . . . . . . 25

2.4.2 Higher-level Constructions . . . . . . . . . . . . . . . . . . . 28

2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

xv

3 Framework for Integrated Uncertainty Modeling and Visualization 33

3.1 A New Approach to Information Uncertainty . . . . . . . . . . . . . 33

3.2 Analysis of Issues and Requirements . . . . . . . . . . . . . . . . . . 35

3.2.1 Ad hoc Visualization Techniques . . . . . . . . . . . . . . . . 35

3.2.2 Incoherence of Uncertainty Models . . . . . . . . . . . . . . 38

3.2.3 Artificial Separation of Information and Uncertainty . . . . . 39

3.3 Components of the Framework . . . . . . . . . . . . . . . . . . . . . 40

3.3.1 Spreadsheet Paradigm . . . . . . . . . . . . . . . . . . . . . 41

3.3.2 Uncertainty Encapsulation . . . . . . . . . . . . . . . . . . . 42

3.3.3 Uncertainty Abstraction . . . . . . . . . . . . . . . . . . . . 42

3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4 Spreadsheet Paradigm for Information Uncertainty 45

4.1 Motivation and Objectives . . . . . . . . . . . . . . . . . . . . . . . 45

4.2 Related Work on Spreadsheets . . . . . . . . . . . . . . . . . . . . . 46

4.3 Architecture and Features . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3.1 Uncertainty Encapsulation . . . . . . . . . . . . . . . . . . . 48

4.3.2 Uncertainty Abstraction . . . . . . . . . . . . . . . . . . . . 50

4.4 New Process and Workflow . . . . . . . . . . . . . . . . . . . . . . . 52

4.5 Capabilities and Advantages . . . . . . . . . . . . . . . . . . . . . . 53

4.6 Case Study: Financial Decision Support . . . . . . . . . . . . . . . . 55

5 Uncertainty Encapsulation and Automated Propagation 61

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.2 Unified Information Uncertainty Framework . . . . . . . . . . . . . . 62

5.2.1 Conceptualizing Information Uncertainty and its Usage . . . . 62

5.2.2 Categorization of Uncertainty Models . . . . . . . . . . . . . 66

5.2.3 Data Structures for Information Uncertainty . . . . . . . . . . 69

5.3 Automated Propagation of Information Uncertainty . . . . . . . . . . 73

5.3.1 Uncertainty Propagation Model . . . . . . . . . . . . . . . . 73

5.3.2 Hierarchical Heterogeneous Propagation . . . . . . . . . . . 74

5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6 Uncertainty Abstraction for Visualization 81

6.1 Motivation and Objectives . . . . . . . . . . . . . . . . . . . . . . . 81

6.2 User-objectives for Information Uncertainty Visualization . . . . . . . 82

6.2.1 Analysis of User-Objectives . . . . . . . . . . . . . . . . . . 83

6.2.2 A Computer Assisted User-Objectives Selection Method . . . 86

6.3 Uncertainty Abstraction Models . . . . . . . . . . . . . . . . . . . . 88

xvi

6.3.1 The Unified Uncertainty Model . . . . . . . . . . . . . . . . 88

6.3.2 The Dual Uncertainty Model . . . . . . . . . . . . . . . . . . 89

6.3.3 Design and Use . . . . . . . . . . . . . . . . . . . . . . . . . 92

6.3.4 Alternative Models . . . . . . . . . . . . . . . . . . . . . . . 94

6.4 Case Study: User-Objectives in Financial Decision Support . . . . . . 95

6.5 Case Study: Relevancy Objective in Business Process Management . 104

7 Integration of Core Features 113

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

7.2 Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . 114

7.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

7.3.1 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 116

7.3.2 Core Components . . . . . . . . . . . . . . . . . . . . . . . 117

7.3.3 Plugin Components . . . . . . . . . . . . . . . . . . . . . . . 124

7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

8 Advanced Features and Extensibility 129

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

8.2 Advanced Features . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

8.2.1 Hierarchical Spreadsheets . . . . . . . . . . . . . . . . . . . 130

8.2.2 Floating Observers and Embedded Visualizations . . . . . . . 133

8.2.3 Customization . . . . . . . . . . . . . . . . . . . . . . . . . 135

8.3 Extensibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

8.4 Case Study: GPGPUSheet . . . . . . . . . . . . . . . . . . . . . . . 142

9 Evaluation 145

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

9.2 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 146

9.2.1 Construction Experiments . . . . . . . . . . . . . . . . . . . 148

9.2.2 Retrospection Experiments . . . . . . . . . . . . . . . . . . . 152

9.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

9.3 Sensitivity Analysis Surveys . . . . . . . . . . . . . . . . . . . . . . 157

9.3.1 First Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

9.3.2 Second Survey . . . . . . . . . . . . . . . . . . . . . . . . . 159

9.4 Case Study: Business Planning . . . . . . . . . . . . . . . . . . . . . 162

10 Conclusion and Future Work 169

10.1 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

10.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

xvii

10.3 Possible Applications and Extensions . . . . . . . . . . . . . . . . . 170

Bibliography 173

A First Survey 185

B Second Survey 189

xviii

List of Figures

2.1 Fuzzy Set for Hot . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Fuzzification for Temperature . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Results of Fuzzy Operations are Shown by the Grey Shaded Regions . 15

2.4 Defuzzification Using an α-cut . . . . . . . . . . . . . . . . . . . . . 16

2.5 Example Rough Set for Containment of a Region . . . . . . . . . . . 16

2.6 Four agents work together in the Visualization Support System . . . . 18

2.7 The Visualization Task Network (VTN) Learns Task-oriented Visual-

ization Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.8 An Ontology of Visualization . . . . . . . . . . . . . . . . . . . . . . 19

2.9 Visualization techniques categorized by the type of data to be visual-

ized [41] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.10 Selected examples of visualization techniques . . . . . . . . . . . . . 21

2.11 The Model-based Visualization Taxonomy . . . . . . . . . . . . . . . 23

2.12 Relationship between Uncertainty Visualization, Information Uncer-

tainty Visualization, Error Visualization, and Fuzzy Visualization . . . 24

2.13 Using opacity to show the structure of uncertainty. Color scheme (left),

Normal rendering (centre), Uncertainty structure (right) . . . . . . . . 26

2.14 Some visual mappings for showing difference. From left to right: over-

lay, rainbow mapping, white-black-white pseudo-coloring, glyph (hi-

pass filter), glyph (low-pass filter) . . . . . . . . . . . . . . . . . . . 27

2.15 How much tip should be given based on the quality of the food and

service using fuzzy inference . . . . . . . . . . . . . . . . . . . . . . 28

2.16 Two frames from an animation that uses a shimmering effect to indi-

cate uncertainty by oscillating luminosity in regions of high uncertainty 30

xix

2.17 A visualization that draws the probability density function over asso-

ciated data points . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.1 Visualizations of Employment Numbers in California. Years 2005-

2010 are predicted. (a) Assuming Average Growth (b) Indicating Growth

is Estimated (c) Possible Growth (d) Likely Growth. (Data Source:

California Employment Development Department) . . . . . . . . . . 37

3.2 Description of the framework . . . . . . . . . . . . . . . . . . . . . . 40

3.3 Additional Layer in Spreadsheet Hierarchy . . . . . . . . . . . . . . 41

3.4 Screenshot of the Prototype System . . . . . . . . . . . . . . . . . . 44

4.1 Basic Cell Type Object Hierarchy . . . . . . . . . . . . . . . . . . . 48

4.2 Novel CellType Object Hierarchy . . . . . . . . . . . . . . . . . . . 48

4.3 Screen-shot of the Prototype . . . . . . . . . . . . . . . . . . . . . . 51

4.4 Visualization Sheet for the Graph in Figure 4.7 . . . . . . . . . . . . 51

4.5 Process for Constructing an Uncertainty Spreadsheet . . . . . . . . . 53

4.6 Interval Modeling Example: (a) Original Model (b) Traditional Spread-

sheet (c) Prototype System Uncertainty Hidden (d) Prototype System

Uncertainty Shown . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.7 Using An Interval (±0.5) for Annual Change in Interest Rates Propa-

gates the Uncertainty to NPV . . . . . . . . . . . . . . . . . . . . . . 58

4.8 Volumetric Representation of the Most Likely Effect Interest Rate Changes

Will Have on NPV. . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.1 Progression Through States of Information Uncertainty (Boxes) as a

Result of Information (Arrows) . . . . . . . . . . . . . . . . . . . . . 63

5.2 Projection of Information Uncertainty onto an Estimate Point . . . . . 64

5.3 All Collapses of the Interval [4.5,5.5] . . . . . . . . . . . . . . . . . 65

5.4 All Collapses of a Fuzzy Number Around 5 . . . . . . . . . . . . . . 65

5.5 Collapses for a Fuzzy Number Around 5, Using a Cut Plane . . . . . 65

5.6 The Interval Data Structure . . . . . . . . . . . . . . . . . . . . . . . 71

5.7 Definition of Continuous Rough Set Using a Marker Sequence . . . . 72

5.8 Definition of Linearly Defined Fuzzy Set Using a Marker Sequence . 72

5.9 The information uncertainty modeling techniques sorted into three strata 75

5.10 Example of Increasing Levels of Uncertainty Information . . . . . . . 76

5.11 Sample Promotion/Demotion Graph . . . . . . . . . . . . . . . . . . 78

6.1 Graphs illustrating the Visual Treatment of Information with Variable

Degrees of Uncertainty under Different Objectives. . . . . . . . . . . 84

6.2 Schematic Illustration of the Dual Uncertainty Model . . . . . . . . . 90

xx

6.3 UML Diagram of the Dual Uncertainty Model . . . . . . . . . . . . . 92

6.4 Illustration of the Quad Uncertainty Model . . . . . . . . . . . . . . . 94

6.5 Example of a Recursive UUM . . . . . . . . . . . . . . . . . . . . . 95

6.6 Possible effects of interest rate movements on NPV (2D). . . . . . . 97

6.7 Possible effects of house price movements on NPV (2D). . . . . . . . 99

6.8 Possible effects of house price movements on NPV (3D). . . . . . . . 99

6.9 Possible effects of house prices and interest rates on NPV. . . . . . . . 99

6.10 Most likely profitability resulting from changes in interest rates . . . . 100

6.11 Volumetric representation of the most likely effect interest rate changes

will have on NPV. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6.12 Likelihood of NPV . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6.13 Effect of interest rate changes grouped into 5 year periods . . . . . . . 103

6.14 Optimum time to sell the property under different economic conditions. 106

6.15 Architecture of the case study system . . . . . . . . . . . . . . . . . 106

6.16 YAWL query: Prototype tool for the graphical business specification

reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

6.17 Production rules for reducing a YAWL graph . . . . . . . . . . . . . 108

6.18 Original graph prior to simplification . . . . . . . . . . . . . . . . . . 110

6.19 Reduced specification using collapse approach (α = 2.5) . . . . . . . 111

6.20 Reduced specification for text query “legal” using decimation approach

(β = 0.5, α = 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

7.1 Design of the User Interface . . . . . . . . . . . . . . . . . . . . . . 114

7.2 The Spreadsheet Architecture . . . . . . . . . . . . . . . . . . . . . . 115

7.3 High-level Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 116

7.4 View and Controller Classes for the Spreadsheet . . . . . . . . . . . . 117

7.5 The Core Components . . . . . . . . . . . . . . . . . . . . . . . . . 118

7.6 UML Inheritance Diagram for the Kernel Class . . . . . . . . . . . . 118

7.7 Main Classes in the Datamodel Component . . . . . . . . . . . . . . 119

7.8 Relationship of the Dependency Graph to Cells and CellContainers . . 120

7.9 Cell C is Dependent on A Multiple Times . . . . . . . . . . . . . . . 120

7.10 UML Inheritance Diagram for the DependencyGraph Class . . . . . . 121

7.11 Example Formula and its CodeTree . . . . . . . . . . . . . . . . . . 123

7.12 Formula Language Definition . . . . . . . . . . . . . . . . . . . . . . 123

7.13 UML Diagram for the CodeTree Class . . . . . . . . . . . . . . . . . 124

7.14 UML Inheritance Diagram for the Propagation Models . . . . . . . . 124

7.15 The IPropagationMethod Class . . . . . . . . . . . . . . . . . . . . . 124

7.16 The IPlugin Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 125

7.17 The ICellType Interface . . . . . . . . . . . . . . . . . . . . . . . . . 125

xxi

7.18 The IDUMMethod Interface . . . . . . . . . . . . . . . . . . . . . . 126

7.19 The UncertaintyRange and UncertaintyRangeSet Classes . . . . . . . 126

7.20 UML Inheritance Diagram for the IVisualElement Interface . . . . . . 127

8.1 Hierarchical Spreadsheet Prototype. The Parent Sheet (Left) Contains

the Child Sheet (Right) . . . . . . . . . . . . . . . . . . . . . . . . . 132

8.2 Floating Observer, Observing the Uncertainty Line Graph in Cell D14 134

8.3 Dependency Tree for the Floating Observer from Figure 8.2 . . . . . 135

8.4 CellType List Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

8.5 Propagation Model Editor . . . . . . . . . . . . . . . . . . . . . . . . 137

8.6 Propagation Model, Method, and Model Set . . . . . . . . . . . . . . 138

8.7 Propagation Model Set Editor . . . . . . . . . . . . . . . . . . . . . 139

8.8 Dual Uncertainty Model Selector . . . . . . . . . . . . . . . . . . . . 140

8.9 Prototype system for GPGPU visualization . . . . . . . . . . . . . . 144

9.1 Spreadsheet for First Experiment, Without Uncertainty . . . . . . . . 149

9.2 Spreadsheet for First Experiment, With Uncertainty . . . . . . . . . . 149

9.3 Spreadsheet for First Experiment, Analytical Approach Using Tradi-

tional Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

9.4 Monte-Carlo Spreadsheet for First Experiment . . . . . . . . . . . . . 151

9.5 Construction Cost Graph . . . . . . . . . . . . . . . . . . . . . . . . 155

9.6 Number of Formulae in Construction Experiments . . . . . . . . . . . 155

9.7 Formula and Layout Changes During Retrospection Experiments . . . 156

9.8 Background of Respondents . . . . . . . . . . . . . . . . . . . . . . 160

9.9 Average Completion Time . . . . . . . . . . . . . . . . . . . . . . . 161

9.10 Questions and Average Responses . . . . . . . . . . . . . . . . . . . 162

9.11 Hierarchical Business Plan Spreadsheet . . . . . . . . . . . . . . . . 163

9.12 Market Share Overview with Embedded Sheets . . . . . . . . . . . . 164

9.13 Market Share for Year 1 . . . . . . . . . . . . . . . . . . . . . . . . . 165

9.14 Target Market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

9.15 Typical Break-even Visualization . . . . . . . . . . . . . . . . . . . . 166

9.16 Probabilistic Break-even Visualization . . . . . . . . . . . . . . . . . 167

xxii

List of Tables

2.1 Sources and Causes of Information Uncertainty . . . . . . . . . . . . 10

2.2 Data Stages in the Data State Model . . . . . . . . . . . . . . . . . . 22

2.3 Transformation Operators in the Data State Model . . . . . . . . . . . 23

4.1 Format of Cells in the Prototype System . . . . . . . . . . . . . . . . 49

4.2 Prototype Uncertainty Interrogation Functions . . . . . . . . . . . . . 52

5.1 Predicted Growth Rates used in Figure 3.1 . . . . . . . . . . . . . . . 62

5.2 Categories of Information Uncertainty Modeling Techniques . . . . . 67

5.3 Common Information Uncertainty Modeling Types . . . . . . . . . . 70

6.1 Information Uncertainty Visualization Objectives . . . . . . . . . . . 83

6.2 Questions Used to Elicit the User-Objective . . . . . . . . . . . . . . 86

8.1 Comparison of IntervalCell and SpreadsheetCell . . . . . . . . . . . . 132

8.2 Examples from the Prototype Addressing Scheme . . . . . . . . . . . 132

8.3 Novel Cell Types for GPGPU . . . . . . . . . . . . . . . . . . . . . . 142

8.4 Novel Functions for GPGPU . . . . . . . . . . . . . . . . . . . . . . 143

9.1 A Selection of Normal Probability Functions in Microsoft Excel 2003 146

9.2 Actions of the User . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

9.3 Retrospection Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

9.4 Respondents to the Survey . . . . . . . . . . . . . . . . . . . . . . . 157

xxiii

Statement of Original Authorship

The work contained in this thesis has not been previously submitted for a

degree or diploma at any other higher education institution. To the best of

my knowledge and belief, the thesis contains no material previously pub-

lished or written by another person except where due reference is made.

Signature:Alexander Streit

Date:

xxv

CHAPTER 1

Introduction

1.1 MotivationThe term information uncertainty refers to vagueness, imprecision, fuzziness, likeli-

hood, and related uncertainty as it is present in information. Many problems are subject

to information uncertainty and, in response, numerous techniques have been developed

to model this uncertainty. Modeling information uncertainty not only provides greater

confidence in results, but can also give an indication of how much confidence to place

in the results. While visualization is a popular tool, information uncertainty visualiza-

tion is far less widespread.

In this work we have identified four major barriers to the uptake of information

uncertainty modeling and visualization. Firstly, there are numerous information uncer-

tainty modeling techniques, each of which are treated differently. This forces users to

anticipate their information uncertainty needs before building their data model. Sec-

ondly, parameters of the uncertainty space tend to be treated at the same level as vari-

ables, which makes it easier to introduce avoidable errors and causes the information

uncertainty modeling technique to dictate the structure of the user’s model. Thirdly,

propagation of uncertainty information must be manually managed by the user, which

requires expertise, is error prone, and can be tedious. Fourthly, uncertainty visualiza-

tion techniques tend to be developed for particular information uncertainty types and

they are largely incompatible with other forms of uncertainty information. This nar-

rows the selection of visualization techniques available and results in a tendency for ad

hoc information uncertainty visualization techniques.

Information uncertainty modeling makes it more difficult to manage the data model

2 Chapter 1. Introduction

due to increased information. Furthermore, it is common that a chosen uncertainty

modeling technique will subseqently need to be changed, since knowledge about the

uncertainty changes as more information becomes available. This is currently a diffi-

cult and error prone process.

Visualization of information uncertainty poses its own unique challenges. Existing

visualization techniques may not be appropriate for uncertianty information and there

are issues with information overloading and interpretatability of results. On a practical

level, there is a lack of tools that are conducive to visualizing information uncertainty.

To ease the burden of managing the information overload in modeling and visual-

ization requires an integrated system that covers the entire workflow cycle from data

acquisition to visualization. Tools are also needed to help users with higher-level tasks

such as selection of modeling and propagation options, and organization and compar-

ison of visual mappings. More specifically, the architecture should support automated

uncertainty propagation and allow easy switching between different uncertainty mod-

els, and different methods of display.

Spreadsheets are often used to perform uncertainty based analysis and they have

previously been shown to be well suited as an approach to visualization. However, the

benefit of a spreadsheet approach to uncertainty modeling and visualization has not yet

been explored. This thesis extends the spreadsheet paradigm to support information

uncertainty modeling and visualization in an integrated whole.

1.2 AimsThe overall aim for this thesis is to devise an integrated information uncertainty mod-

eling and visualization environment that has the following features:

Hierarchical structure: The system should differentiate between levels of detail in

the data model. Uncertainty information is of a lower level of detail than the

variables.

Reduce data type lock-in: If a data model is constructed using particular informa-

tion uncertainty modeling techniques, the cost to change to another modeling

technique should be minimized.

Adaptive: Information about the uncertainty space of a variable should be easy to add,

change, or remove at any stage of the modeling and visualization process.

Seamless integration of information and its uncertainty: There should not be an ar-

tificial separation between the information and its uncertainty.

1.2 Aims 3

Simplify information uncertainty modeling: Users should not be required to have

an intimate understanding of the modeling technique mechanics in order to use

it.

Automate propagation: Uncertainty information needs to be propagated and the sys-

tem should carry this out automatically.

Less error prone: The system should reduce the potential for user induced errors.

Flexible: Users should be able to map uncertainty information into alternative models

and visual features so that they can explore the impacts of different modeling

and visualization techniques.

Robust: When the uncertainty information changes, the existing data model and vi-

sualizations should continue to function correctly.

Extensibile: There are numerous information uncertainty modeling techniques and

the design of the system should allow for more to be added.

In order to achieve this aim, the following tasks are performed:

• Examine the field to determine the current state of play, covering information

uncertainty modeling techniques, visualization processes and practices, and un-

certainty visualization;

• Design an integrated information uncertainty modeling and visualization frame-

work

• Investigate how the spreadsheet paradigm can be extended to intrinsically sup-

port information uncertainty modeling and visualization;

• Explore uncertainty encapsulation as an approach to semantic association of in-

formation and its uncertainty;

• Develop an automated propagation mechanism and a method for resolving un-

usual modeling technique combinations;

• Design uncertainty abstractions that enable visualization mappings to be data

type independent;

• Explore the user-objectives approach as a means for defining visualization char-

acteristics;

• Conduct a quantitative analysis comparing the cost of our approach to existing

methods;


• Analyze feedback from potential users;

• Conduct case studies on financial decision support and business planning to es-

tablish the viability of the spreadsheet for commercial uses;

• Investigate the capability of the architecture to be applied to non-uncertainty uses

through a case study; and

• Draw conclusions and make recommendations for future work.

1.3 ScopeThis thesis deals with information uncertainty, which is uncertainty about the true value

of a unit of information. The intrinsic connection between uncertainty and information

is the basis for our encapsulation approach, which underpins the automatic propagation

and visualization-oriented uncertainty abstraction. However, there exist several other

forms of uncertainty, such as uncertainty arising from interpretation, for which the

encapsulation approach may not be suitable. The methods presented in this thesis is

limited to those forms of uncertainty that can be parametrized in some quantifiable

way.

Modeling of uncertainty has its foundation in mathematics. This project is con-

cerned with the frameworks, approaches, and methods for applying these modeling

techniques. As such, mathematical issues will be touched on, however, detailed cov-

erage of mathematical models is beyond the scope of this work and it is assumed that

users will use the mathematical techniques appropriate to their problem.

1.4 Original ContributionCurrent investigations into information uncertainty visualization have focused on vi-

sualization techniques for particular information uncertainty data types. We approach

the problem of information uncertainty visualization holistically, from modeling and

automated propagation through to user-objectives in visualization.

We produce an integrated information uncertainty modeling and visualization frame-

work and design the information uncertainty visualization spreadsheet, which intrin-

sically support information uncertainty modeling, automated uncertainty propagation,

and uncertainty model abstracted visualization.

To achieve this we extend the spreadsheet paradigm to incorporate information un-

certainty and visualization features. This requires a number of components. Firstly,

our encapsulation of uncertainty information approach semantically links the infor-

mation to its uncertainty. Secondly, we introduce the uncertainty propagation model

1.5 Significance 5

to manage the mechanics of propagating uncertainty, including operations involving

mixed data type parameters. Thirdly, we present hierarchical heterogeneous propaga-

tion, which automatically determines suitable combinations of the available methods

to ensure that the propagation can be achieved. Fourthly, we produce uncertainty ab-

straction models, which abstract the uncertainty information for visual mapping in vi-

sualizations by providing a common plural value type. Fifthly, we incorporate flexible

visualization capabilities into the spreadsheet using a visualization sheet.

Abstraction from the information uncertainty data type means that traditional data

type specific visual mapping criteria may no longer be applicable, leaving a gap in the

knowledge. To address this, we investigate user-objectives for information uncertainty

visualization, which describe the characteristics of uncertainty space that the user is

seeking to visualize. User-objectives provide a data type abstracted means of describ-

ing, executing, and evaluating visualizations.

1.5 SignificanceThe significance of this work is that it provides the means for intuitive and non-

intrusive environment for modeling and visualizing information uncertainty. This has

three major effects. Firstly, access to information uncertainty visualization is designed

into the system from the outset and it does not require user expertise in uncertainty

techniques to manage information uncertainty. Secondly, uncertainty information is

easily added, changed, or removed at any stage of the process. Thirdly, information

uncertainty visualizations can be built independently of the modeling technique, pro-

viding a coherent foundation for the development of visualization techniques while

reducing their tendency to be ad hoc.

Information uncertainty is a problem in many fields. Overcoming barriers to its

modeling and visualization is an important step in managing a difficult problem.

1.6 Organization of the ThesisThe organization of this thesis is as follows. Chapter 2 introduces background mate-

rial on uncertainty modeling techniques, visualization techniques, and what has been

done to visualize uncertainty. Chapter 3 describes the framework that integrates infor-

mation uncertainty modeling and visualization tasks together into a coherent whole.

Chapters 4 through 6 cover the components of the framework: Chapter 4 elaborates

on the spreadsheet paradigm as a mechanism for integrating and managing these tasks.

Chapter 5 investigates the encapsulation approach to information uncertainty, which

includes the unified hierarchy and automated propagation. Chapter 6 explores the ab-

straction approach, which includes uncertainty abstraction models and user-objectives


for visualization. Chapter 7 integrates the components into a core system, covering

the requirements, design, and architecture. Chapter 8 considers advanced features and

extensibility of the system. Chapter 9 presents the evaluations of the system, with a

comparative analysis of different approaches, a discussion of a survey, and a case study

in business planning. Chapter 10 provides a conclusion and points to future work.

CHAPTER 2

Background

“As far as the laws of mathematics refer to reality, they are not certain;

and as far as they are certain, they do not refer to reality.”

– Albert Einstein1

2.1 IntroductionInformation uncertainty is a complex subject that is inherent in many real-world prob-

lems. The uncertainty comes from different sources and can be interpreted and mod-

eled in various ways. There are often subtle interactions between variables and uncer-

tainty, which can be difficult to understand. Visualization of information uncertainty

presents an opportunity to provide deeper insights into the nature of the information,

its uncertainty, and the impact it has on outcomes. However, the difficulty of adopt-

ing information uncertainty and the lack of visualization tool support has caused many

practitioners to ignore the uncertainty completely or to ignore situations where the

uncertainty is deemed too high. This practice results in valuable knowledge being dis-

carded and reduced quality in outcomes, or worse, can even result in entirely wrong

outcomes.

There are aspects of information uncertainty that have been given considerable at-

tention, particularly in the field of mathematics. Two aspects have been especially

well developed: the first includes the various mathematical models that exist for repre-

senting, measuring, and recording uncertainty. The second aspect is the collection of

1In J. R. Newman (ed.) The World of Mathematics, New York: Simon and Schuster, 1956

8 Chapter 2. Background

rules and techniques for propagating, estimating, and minimizing information uncer-

tainty. These models and techniques range from the statistical methods and probabil-

ities through to fuzzy models. Research into visualization of information uncertainty

has only been carried out sporadically during the last decade. Earlier work has focused

on a data-driven approach, with visual data representations for particular data types or

responding to the needs of specific applications. More recent work has investigated

task-based approaches and sought to integrate higher-level issues, such as software

architectures and frameworks for visualization systems.

The aim of this chapter is to provide background for understanding infomation un-

certainty modeling and visualization by examining relevant works and identifying key

issues. The chapter is organized as follows. Section 2.2 describes information uncer-

tainty in general, covering sources of information uncertainty, understanding informa-

tion uncertainty and its usage, and information uncertainty modeling techniques. Sec-

tion 2.3 discusses relevant issues in visualization, focusing on the process and sense-

making cycle, and visualization techniques. Section 2.4 examines current progress and

key techniques in information uncertainty visualization. A summary of this chapter is

given in Section 2.5.

2.2 Information UncertaintyIn many circumstances the true value of a variable is not fully known, giving rise

to information uncertainty. The information that is known about the variable can be

stored and this technique is referred to as information uncertainty modeling. As an

example, information uncertainty modeling can be used to aid analysis of the potential

environmental impact of a new road. Data is required about the type, amount, and

distribution of vegetation; the variety, location, and habits of local animals; and how

all of these interact. The data that is collected will only be accurate to a certain level

of precision, which can be modeled. Further, much of the information derived from

expert knowledge will be qualitative in nature and thus dependent on interpretation.

It is already a significant task to understand the structure, characteristics, trends,

and interdependency of data. However, information uncertainty serves to complicate

things even further as it requires an understanding of the propagation of uncertainty,

the potential for variation in outcomes, and impacts due to changes in the level of

information uncertainty. Effective visualization of the information and its uncertainty

can help to overcome this problem.

Historically, uncertainty had been regarded as an undesirable factor that is to be

avoided. Only in the 20th century has it become a fundamental component of sci-

ence [62]. However, the term uncertainty itself can vary depending on the author and

the field. For example, Hunter and Goodchild, dealing with spatial databases, reserve

2.2 Information Uncertainty 9

the term uncertainty to refer exclusively to unknown inaccuracy and instead use the

term error for objectively known inaccuracy [47]. Pang et al. use the term uncer-

tainty to cover three categories [86]: statistical, including probabilistic and confidence

methods; error, which refers to differences between estimates and actual values; and

range, which covers intervals of possible values. Klir [58, 60] and Gershon [33] offer

a more general definition of uncertainty as some deficiency in information, and from

there define a measure of information in terms of reduction in uncertainty. Standards

and guidelines have been developed for the management of uncertainty in measure-

ment. One such guide by the National Institute of Standards and Technology (NIST)

describes measurements as approximations and contends that “the result is complete

only when accompanied by a quantitative statement of its uncertainty” [112, pp. 1]. A

similar guide that was issued for analytical chemistry by EURACHEM defines mea-

surement uncertainty as a parameter “that characterizes the dispersion of the values

that could reasonably be attributed to the measurand” [29, pp. 4]. The common theme

is that uncertainty can be characterized for a particular unit of information, and we use

the term information uncertainty to refer to situations where this condition holds.

2.2.1 Sources of Information UncertaintyPang et al. [86] investigated uncertainty visualization and categorized sources of in-

formation uncertainty based on the point of the visualization process in which it is

introduced. The resulting three categories are acquisition, where information uncer-

tainty is introduced from the measurements and models; transformation, introduced

during the information processing step for visualization; and visualization, referring to

the uncertainty introduced through the act of the visualization itself. These categories

are helpful in characterizing the introduced uncertainty for visualization, but lack gran-

ularity in describing the reason for the uncertainty. Thomson et al. [114] focused on

the tasks of information analysts in the field and used their descriptive terms to derive

a categorization for uncertainty in geospatially referenced information.

Information uncertainty can arise due to a number of reasons. Whenever predic-

tions are made, they are uncertain. Errors and imprecision in measurement are another

common source. The Eurochem guide lists eleven sources of measurement uncer-

tainty [29], but is careful to point out that these may not necessarily be independent.

While their list includes “operator effects” to cover human introduced uncertainty, the

sources are mostly concerned with acts of measurement. Pham and Brown [91] pro-

vide a categorization of uncertainty into three categories: factual, pseudo-measurement

and pseudo-numerical, and perceptual-based. Factual information is numerical and

measurement-based. Pseudo-measurement and pseudo-numerical information are nu-

meric approximations. Perceptual-based information is typically linguistic, but can


also be image- or sound-based. Table 2.1 lists typical sources of information un-

certainty and examples of causes (from [92]). Earlier work by Reznik and Pham

matched nine similar categories of uncertainty sources to uncertainty modeling tech-

niques [101].

Sources of information uncertainty Causes

Limited accuracyLimitation in measuring instruments,

or computational processes, orstandards.

Missing dataPhysical limitation of experiments;

limited sample size ornon-representative sample.

Incomplete definitionImpossibility or difficulty inarticulating exact functional

relationships or rules.

InconsistencyConflicts arisen from multiple sources

or models.Imperfect realisation of a definition Physical or conceptual limitation.

Inadequate knowledge about theeffects of the change in environment

Model does not cover all influencefactors; or is made under slightly

different conditions; or is based on theviews of different experts.

Personal bias Differences in individual perception

Ambiguity in linguistic descriptionsA word may have many meanings; or

a state may be described by manywords.

Approximation or assumptionsembedded in model design methods or

procedures

Requirements or limitations of modelsor methods.

Table 2.1: Sources and Causes of Information Uncertainty

2.2.2 Understanding Information UncertaintyThe search for truth is a goal of science and the presence of uncertainty can imply

a deficiency in our understanding. This explains why throughout most of recorded

history scientific thought has sought to avoid uncertainty2. However, attitudes toward

uncertainty have begun to shift, partly due to discoveries such as the Heisenberg un-

certainty principle. Today, uncertainty is viewed as an intrinsic property of problems

in most fields. For example, Couclelis noted that considerable effort had been devoted

to fighting uncertainty in Geographic Information Systems (GIS), but that there are

many things that cannot be known, and the inability to know was not due to human

limitation [19].2The interested reader is directed to Appendix A of [2] for history of perspectives on knowledge.


Many disciplines use information uncertainty modeling techniques to manage un-

certainty. The incorporation of information uncertainty techniques enables practition-

ers to describe and quantify the uncertainty space. Uncertainty information at the

inputs can then be propagated through the model to the outputs. The output can now

provide additional information. For example, how much confidence we should place in

the result, what alternatives the result may have, and others depending on the modeling

technique and the inputs.

A recent use for information uncertainty techniques is to simplify systems by re-

moving less important information. This mirrors human reasoning, where we reserve

detail for items of interest. For example, when ascertaining whether to jump out of the

way of a moving vehicle, a rough estimate of the vehicle’s velocity is usually sufficient

to determine the appropriate action [77] and precise knowledge of the actual velocity

is usually not necessary.

There are two main approaches to information uncertainty modeling and propaga-

tion. The first approach is to use analytical techniques, which require an understanding

of mathematical principles involved. The second approach is to use numerical tech-

niques, such as Monte-Carlo simulation. Sometimes numerical techniques are used

implicitly without the user realizing, usually by manually varying the inputs and ob-

serving their effects.

Uncertainty is so intrinsic to information that Klir [62, 63, 58, 61, 59, 60] has been

working on a generalized information theory, which has the aims of incorporating

uncertainty and information into a unified theory. Their approach is to conceive of

uncertainty-based information as being the result of a reduction in uncertainty. As a

result of some action, the a priori uncertainty U1 becomes the a posteriori uncertainty

U2 and the information derived from this action is therefore given by U1−U2 [59].

Using information uncertainty modeling techniques not only provides greater con-

fidence in results, but can also give an indication of how much confidence to place in

the result.

2.2.3 Approaches to Modeling Information UncertaintyThere are numerous uncertainty modeling techniques and we will describe several of

the major ones here. Since the sources and causes of uncertainty are different, various

mathematical models have been developed to faithfully represent different types of

information. We summarise these models into four common types:

Probability denotes the likelihood of an event to occur or a positive match. Prob-

ability theory provides the foundation for statistical inference (e.g. Bayesian

methods) [1, 5].


Possibility provides alternative matches, e.g. a range of errors in measurement [2].

Provability is a measure of the ability to express a situation where the probability of

a positive match is exactly one. Provability is the central theme of techniques

such as Dempster-Shafer calculus.

Membership denotes the degree of match and allows for partially positive matches,

e.g. fuzzy sets [3, 6], rough sets [4].

Probability theory models uncertainty in terms of anticipation: the expectation that an

outcome will eventuate is characterized by a probability.

Classical probability theory describes the ratio between favorable and indifferent

outcomes, which has several shortcomings. This led to the development of frequentest

probability theory, which defines the chance of a given result under random conditions.

Thus, in a repeated experiment the probability of an event will tend toward the ratio

between the number of times it occurs to the number of times the experiment was run:

Pr(x) =x

x+ x

Where Pr : X→ [0,1] is the probability function, x∈X is the event, and x = X−{x}is all other outcomes. A probability distribution completely describes the expected

outcomes of a random variable. For real-valued random variables, the probability dis-

tribution can be defined by

F(x) = ∑xi≤x

Pr(xi)

for discrete probabilities and

F(x) =x∫

−∞

f (t)dt

for continuous probabilities, where f is the probability density function. A proba-

bility density function (PDF) is effectively a histogram of expected outcomes, with a

scale such that the integral is unity,∫

f (t)dt = 1.

Probability distributions can take any form, however several well studied distri-

butions exist. Of these, the two most commonly used distributions are the uniform

distribution and the normal distribution. The uniform distribution assigns every out-

come an equal probability. The normal distribution3 has a PDF of

F(x) =1

σ√

2πe−

(x−μ)2

2σ2

3also known as the Gaussian distribution after Gauss


where μ is the mean and σ is the standard deviation. Normal distributions find

common use because the sum of many independent random variables will approximate

a normal distribution.

Monte-Carlo simulation is a numerical approach to uncertainty that uses probabil-

ity distributions [76, 3]. Input variables are assigned a probability distribution, com-

monly a uniform or normal distribution. Numerous random instances are chosen for

the inputs according to these distributions, and the outputs that are calculated can then

be used to characterize the outputs of the system.

The frequentest view of probability models an expectation based purely on fre-

quency of events. An alternative view is Bayesian probability theory, which has found

widespread use in fields such as machine learning and computer vision [90], and

econometrics [35]. The mathematician Thomas Bayes introduced a theorem that was

generalized by Laplace but is still referred to as Bayes’ Theorem. The theorem relates

the conditional and marginal probability of events of two random variables, x and y:

Pr(x|y) =Pr(y|x)Pr(x)

Pr(y)

Bayes’ theorem enabled a new philosophical view of probabilities as modeling

belief, using what is called Bayesian inference. Thus, our expectation of an event can

be revised: Pr(x|y) is the posterior probability, our revised expectation of event x given

evidence y; Pr(y|x) is the conditional probability of seeing y given the hypothesis that

x is true; Pr(x) is the prior probability of x; and Pr(y) is the marginal probability of y,

whether or not x is true.

Probabilities, whether frequentest or Bayesian, are not the only means of describ-

ing uncertainty. Classical sets are a means of defining uncertainty. For example, the

diagnosis made by a medical doctor could be that a patient suffers from the flu or the

cold. In this situation there is a possibility that either (or both) be true and there is un-

certainty about which it is. Lotfi Zadeh, in his seminal 1965 paper, proposed fuzzy sets,

which along with fuzzy logic enable human-like reasoning using partial truth4. A fuzzy

set is a set where each member can be assigned a truth value. This can be expressed as a

fuzzy membership function:μ : X→ [0,1], where 0 indicates not a member, 1 indicates

definitely a member, and all numbers in between indicate a partial membership.

A good example of fuzzy sets is given by Mendel in [77]. College students were

asked to rank words such as “somewhat” and “quite a bit” against a numerical scale of

quantity. Although individual answers varied significantly, a clear ordering emerged

and Mendel was able to produce a mapping between these words and their indication of

4Zadeh was not the first to investigate partial truth and interested readers are directed to the works ofŁukasiewicz [71]


quantity. From there fuzzy sets can be constructed, such as the set lots, which capture

the degree to which numeric values can be representations for each of the words.

For example, assuming that 24◦C is normal room temperature and 30◦C is com-

pletely hot, temperatures between 24◦C and 30◦C are partially hot. The set of hot

temperatures might therefore be given by the following membership function, illus-

trated graphically in Figure 2.1:

μHot(x) =

⎧⎪⎨⎪⎩

1 x > 30x−2430−24 24≤ x≤ 30

0 x < 24

20 22 24 26 28 30 32

1

0

Membership

Temperature

Hot

Figure 2.1: Fuzzy Set for Hot

Fuzzy logic is the inference counterpart for fuzzy sets. A complete fuzzy logic sys-

tem consists of a fuzzifier, inference engine, and defuzzifier [77]. The fuzzifier maps

numeric values to partial memberships of fuzzy sets. The inference engine executes

operations and rules on fuzzified information. The defuzzifier converts a fuzzy repre-

sentation into a bi-valued representation, typically using an α-cut.

The graph in Figure 2.2 shows an example fuzzifier that maps room temperatures to

the fuzzy sets Cold, Normal, and Hot. A temperature of 25.5◦C will be wholly outside

the set Cold, mostly within the set Normal, and to a lesser extent partially within the

set Hot.

Methods are defined for several operations including set fuzzy AND (intersection),

OR (union), and NOT (complement)5. Traditional fuzzy logic, also called Zadeh fuzzy

logic after its inventor, is shown graphically in Figure 2.3. Given two fuzzy variables,

a and b:

a∪b = aORb = min(μ(a),μ(b))a∩b = aANDb = max(μ(a),μ(b))¬a = NOT a = 1−μ(a)

5Implementations for operators can vary depending on application and interested reader is directedto [77]


1

0

Cold Normal Hot

16 20 24 28 32

Temperature

Figure 2.2: Fuzzification for Temperature

Set A Set B

A AND BA OR B

NOT A

1

0

1

0

1

0

1

0

1

0

Figure 2.3: Results of Fuzzy Operations are Shown by the Grey Shaded Regions

Thus rules can be established using constructs such as D = A∧B∨¬C, where A,

B, C, and D are fuzzy sets.

The defuzzifier typically uses an α-cut, which is a mechanism to translate the fuzzy

output into traditional bi-valued truth, most typically:

μ ′(d) =

⎧⎨⎩1 μ(d)≥ α

0 otherwise

where t ∈ [0,1] and α is called the “α-cut plane” and is in the range [0,1]. An example

of defuzzification with an α-cut of 0.5 is given graphically in Figure 2.4. As the graph

shows, the set of normal temperatures is mapped to the interval [21.5,26.5].


1

0

N o r m a l

1 6 2 0 2 4 2 8 3 2

T e m p e r a t u r e ( º C )

a l p h a c u t = 0 . 5

Figure 2.4: Defuzzification Using an α-cut

A popular representation for uncertainty is the rough set [96]. Rough sets extend

classical sets to allow an element to be both inside and outside the set. Thus there are

three modes: inside, outside, and both. Three operations are defined that translate to

a classical set: the upper limit, the lower limit, and the boundary. The upper limit in-

cludes all items that are wholly inside, or both inside and out. The lower limit includes

only those items that are inside the set. The boundary includes only items that are both

inside and outside the set. This is illustrated graphically in Figure 2.5. An example

application of rough sets is in classifying customer details: the rough set information

provided will contain a customer if all information has been provided, not contain a

customer if no information is provided, and be in both states if some information is

provided but some is missing. The company can send letters requesting information to

¬LOWER(c) where c is the “information provided” rough set.

Boundary Lower

Figure 2.5: Example Rough Set for Containment of a Region

2.3 Visualization 17

Another common classical set-based uncertainty modeling technique is the inter-

val. Intervals define the upper and lower boundaries on a continuum, most commonly

R. The boundaries themselves can be inclusive, which is indicated using square brack-

ets; or exclusive, indicated using rounded brackets. Thus, [0,1) includes zero but ex-

cludes one. Interval arithmetic defines the propagation of uncertainty under common

arithmetic operators. For example, addition is defined as:

[a,b]+ [c,d] = [a+ c,b+d]

2.3 Visualization2.3.1 The Sensemaking ProcessThe user has a visualization objective. To achieve this objective they will use a visual-

ization technique. The technique transforms information and displays it according to

the parameters of the technique. To reach their objective, users will typically iteratively

adjust the parameters and view the results, repeating as often as necessary. There are

three general classes of user objectives, which can also be described as visualization

phases [56]:

1. Exploration, searching the data for relations and patterns

2. Analysis, exploring known relations

3. Presentation, preparing the visualization to communicate information to others

Visualization requires an iteration of choose− inspect− view−ad just, which can be

cumbersome, particularly for novice visualization users. Several studies have there-

fore considered improving the user experience while going through the visualization

process [21, 49, 93, 11].

One study sought to encode the visualization exploration process using an XML-

based language [49]. The encoding captures the parameters used for each iteration of

the loop. By defining a parameter derivation calculus, the results of several visualiza-

tion sessions can then be visualized. Such visualizations of visualization sessions are

designed to aid the user in understanding the progression of their use of the system.

Although the work seeks to formalize the visualization process, it does not improve the

process and is limited to modulating parameters of a particular visualization technique.

A significant drawback of the work in [49] is that the ability to change to another type

of representation or selection of alternate data are not included in the model.

Visualization is a tool and not an objective as of itself. However, it has been ob-

served (e.g. [73]) that some visualizations are good for publications, tending to be


colorful and showy images, but not informative or applicable to real-world problem-

solving. Ma [73] argues that scientists need to be involved in evaluating the effec-

tiveness of visualization methods, and suggests working with users from application

domains both to devise the requirements of the visualization and to subsequently eval-

uate the techniques through case studies.

Other suggestions for overcoming these obstacles is for visualization to be task-

driven instead of data-driven [93, 11]. One approach to this is through an agent-based

framework [93] (see Figure 2.6), where a profile agent observes the user’s choice of

visualizations and adjusts the systems behavior to improve workflow.

Figure 2.6: Four agents work together in the Visualization Support System

Another proposal is the “Visualization Task Network (VTN)” [11] (see Figure 2.7,

from [11, pp.603]), which can learn the requirements of the user. A VTN is a task-

oriented approach, where the user first selects the task to be achieved. For each chosen

task a set of techniques are proposed by the system. Once a technique is chosen, a

list of attributes (e.g. glyph information, grid spacing, and color) is presented. These

parameters are similar to those in the work of Jankun-Kelly et al. [49]. Each time the

user selects a {task, technique,attribute} set for visualization, the system can increase

the weight of that combination. When the user selects a task, the techniques with the

highest weighting are shown first. Similarly, once a task and technique are chosen, the

attributes with the highest weighting are shown first.

One approach to mapping visual features to visualization techniques takes an objective-

oriented viewpoint, and is derived from the visualisation data ontology outlined in [11].

The mapping begins with the choice of data attributes to be represented: relationships,

resemblances, order and proportion [34]. These attributes are then mapped to visual

features depending on the visualisation task to be performed. The visualisation task

is chosen to enhance the perception of the information required by the viewer for

their specific objective in performing the data analysis. The knowledge required for


such a task-oriented approach is encapsulated in an agent-based visualisation architec-

ture [11].

Figure 2.7: The Visualization Task Network (VTN) Learns Task-oriented VisualizationParameters

A workshop that was held [27] established the visualization ontology represented

in Figure 2.8 (adapted from [27]). Development of the ontology involved investigation

of visualization from multiple perspectives. The result produces a clear anatomy of

visualization, except that it is missing one vital part: the role of the user. The user

plays an integral role in the visualization process, driving parameters and the visual-

ization tasks. By excluding the user from the ontology the authors have neglected to

consider not only usability and cultural issues, but also opportunities such as adaptive

visualization systems.

is aboutuses

Representation Data

Visualisation

Task Transformationsupported-by

input to

output from

Visual Haptic

Isosurface

Technique

is realised through

Key

A B A is-a B

A Bp

A

property p betweenconcepts A and B

Elided hierarchy

Figure 2.8: An Ontology of Visualization


2.3.2 Visualization TechniquesThe topic of visualization is traditionally introduced through reference to a taxonomy

of visualization techniques [41, 99, 15, 16]. A valuable reference text for visualization

techniques is given by Senay and Ignatius [106]. SIGGRAPH’s visualization education

program [41] introduces visualization techniques through a data type based classifica-

tion, which is reproduced in Figure 2.9. Examples of selected techniques are given in

Figure 2.10. The limitation of this classification is that it deals only with continuous

ordinal values. Visualizations for other types of data, such as trees, are not included.

Figure 2.9: Visualization techniques categorized by the type of data to be visualized[41]

Shneiderman [108] recognized the lack of trees and network graphs and addressed

this by including non-ordinal types in their taxonomy. However, the taxonomy itself

halla

This figure is not available online. Please consult the hardcopy thesis available from the QUT Library


(a) 2D Line based Contouring (b) 2D Histogram (c) 3D Streamlines

Figure 2.10: Selected examples of visualization techniques

continues to be based on the data type being visualized. The data types identified are

[108, pp. 337-339]:

• 1-Dimensional, such as textual documents, program source code, and alphabeti-

cal lists of names.

• 2-Dimensional, such as geographic maps, floor plans, and newspaper layouts.

• 3-Dimensional, real world objects such as molecules, the human body, and build-

ings.

• Temporal, such as medical records, project management, or hierarchical presen-

tations to create a data type that is separate from 1-dimensional data.

• Multi-dimensional, such as records in relational databases.

• Trees, which are hierarchies where each item, except the root item, has a link to

its parent.

• Networks, which represent relations that cannot be captured as trees.

OLIVE [99] is an online catalog of visualization systems categorized according to

this taxonomy, although at the time of writing it is only current up to 1997. While

Shneiderman’s taxonomy covers a wider range of visualizations, not all visualization

systems fit conveniently. For example, visualizations that present temporally ordered

3-Dimensional data could fit into either the temporal- or the 3-Dimensional categories.

To overcome these inconsistencies Card and Mackinlay [15] offer a classification based

on additional factors that need to be considered during visualization. Their analysis of

visualization systems considers not only the type of data, but also the filtering functions

applied to them, the controlled (text) and automatic (glyph) processing techniques, the

viewing transformations, and the user interaction elements for every variable in the

visualization. Data types are classified as [15, pp. 92-93]:


• Nominal, meaning they are only equal or unequal to other values.

• Ordinal, meaning they obey a less-than relation.

• Quantitative, meaning it is possible to do arithmetic on them.

• Intrinsically spatial, which are the subset of quantitative types that represent spa-

tial points.

• Geographical, which are the subset of intrinsically spatial types that represent

geographic locations.

• A Set mapped to itself, which is the case in graphs and trees.

This taxonomy is cumbersome for purpose of categorizing visualization techniques.

Unlike the preceding taxonomies, there is no single category for a visualization tech-

nique. Instead, each variable used in the visualization is decomposed according to

twelve factors and presented in a matrix. The matrices of two techniques can be com-

pared to pinpoint the exact differences between them. The intent of the authors is not

only to describe the differences in visualization techniques, but also to suggest new

possibilities for visualization techniques [15, pp. 92].

Chi [16] provides a taxonomy to help implementers understand how to implement

visualization techniques. The proposed taxonomy is based on their earlier work on

the Data State Reference Model [18]. A visualization technique is broken down into

four stages according to the state of the data, as shown in Table 2.2, and the data

transformation operators that transform the data from one stage to another, as listed in

Table 2.3.

Stage Description

Value The raw data.Analytical Abstraction Data about data, or information, a.k.a.

meta-dataVisualization Abstraction Information that is visualizable on the

screen using a visualization techniqueView The end-product of the visualization map-

ping, where the user sees and interpretsthe picture presented to her

Table 2.2: Data Stages in the Data State Model

Tory and Möller [115] argue that these taxonomies are vague because of the termi-

nology used. As an example they cite the use of the word “often” in Card and Mackin-

lay’s definition [115, pp.1]. To reduce this ambiguity, their taxonomy is based on the

data model rather than the type of data itself. A data model is a representation of data


Processing Step Description

Data Transformation Generates some form of analytical ab-straction from the value (usually by ex-traction).

Visualization Transformation Takes an analytical abstraction and fur-ther reduces it into some form of visual-ization abstraction, which is visualizablecontent.

Visual Mapping Transforma-tion

Takes information that is in a visualizableformat and presents a graphical view.

Table 2.3: Transformation Operators in the Data State Model

that may include structure, attributes, relationships, and the data values themselves.

Visualization algorithms create visual representation of data using a data model. The

taxonomy is outlined in Figure 2.11. The visualization algorithms are first classified

as continuous or discrete. Scientific visualization corresponds largely to continuous

models while information visualization corresponds largely to discrete models.

Figure 2.11: The Model-based Visualization Taxonomy

Unlike Card and Mackinlay’s taxonomy, the model-based taxonomy maintains

scalar, vector, and tensor categories for dependent variables. Additionally the taxon-

omy shows greater flexibility than Shneiderman’s taxonomy, categorizing temporally

ordered 3D data as nD data in the continuous model. A limitation of the taxonomy is

that it does not treat temporal data as distinct from 1D data.


2.4 Information Uncertainty Visualization ApproachesJohnson and Sanderson argue that “development of formal theoretical frameworks and

the new visual representations of error and uncertainty will be fundamental to a better

understanding of 3D experimental and simulation data” [51, pp. 5]. This is a relatively

new field in visualization, which is generally referred to as uncertainty visualization.

However, uncertainty and its sources are diverse and the term can have broad meaning.

On the other hand, error visualization (e.g. [51, 83]) and fuzzy visualization (e.g. [94,

39, 5]) imply particular uncertainty modeling techniques. In this thesis we use the

term information uncertainty visualization to refer to visualization of all modeling

techniques where the uncertainty can be codified in information. Thus error- and fuzzy-

visualization are sub-categories of information uncertainty visualization, which is itself

a sub-category of uncertainty visualization. This relationship is shown in Figure 2.12.

u n c e r t a i n t y v i s u a l i z a t i o n

i n f o r m a t i o nu n c e r t a i n t yv i s u a l i z a t i o n

e r r o rv i s u a l i z a t i o n

f u z z yv i s u a l i z a t i o n

Figure 2.12: Relationship between Uncertainty Visualization, Information UncertaintyVisualization, Error Visualization, and Fuzzy Visualization

Visualisation techniques map data variables and information to visual feature di-

mensions for the purpose of highlighting trends, making comparisons, establishing

outliers, examining data composition, and similar reasons. The introduction of uncer-

tainty requires that appropriate visual features be selected to represent it. Blurring sim-

ulates the visual percept caused by an incorrectly focused visual system and therefore

has the most immediately intuitive mapping for uncertainty [92]. Blurring effectively

smears the boundary of the graphic representing the data value, creating a sense of

uncertainty as to where it begins and ends. A number of visual features may be used

in a similar manner, including hue, luminance, saturation, and can be extended into the

temporal domain through animation [67, 34, 10].

Brown [10, pp. 84] offers a summary of available features drawn from the literature

2.4 Information Uncertainty Visualization Approaches 25

(e.g. [44, 33, 91]):

• Intrinsic representations - position, size, brightness, texture, color, orientation,

and shape;

• Further related representations - boundary (thickness, texture and color), blur,

transparency and extra dimensionality;

• Extrinsic representations - dials, thermometers, arrows, bars, different, shapes,

and complex objects - pie charts, graphs, bars, or complex error bars

2.4.1 Low-level FeaturesWe now consider how several low-level features can be used to indicate uncertainty

within information. Low-level features refer to techniques applied to individual ob-

jects within a visualization. The next section describes high-level features, which refer

to techniques that involve the arrangement of two or more objects. These high-level

constructions build on the use of low-level features. The features to be considered are:

hue and luminance, opacity, blurriness, depth, texture, particles, glyphs, and sonifica-

tion.

Hue and Luminance are commonly used to highlight data that is different, or to rep-

resent gradients in the data [117, 56]. Saturation of the hue can be used to high-

light the precision or certainty of the data. The more saturated the hue, the

more certain or crisp the value contained in that region is, while low saturation

regions have the appearance of washing into each other, and can be used to in-

dicate the fuzziness of spatial region boundaries [50, 42]. Variation in hue can

also be used to indicate precision. Regions of higher uncertianty can have fewer

shades, while more precise areas have a smoother appearance. A lack of back-

ground/foreground separation (e.g. eg. red on purple) can also imply uncertainty,

as the region may only just be distinguishable [124]. Brown and Pham [94] used

the color hues to represent the membership values of data points. Color hues

were also used by Lowe et al. [70] to represent belief values in the form of a

flame to facilitate decision making in an anaesthetic monitoring system.

Opacity offers an intuitive method for implying uncertainty. The more uncertain re-

gions can be shown with reduced opacity, creating a ghost-like effect. The in-

verse approach, used by Djurcilov et al. [24, 25], is to map regions of high un-

certainty to high opacity, thus drawing attention to the uncertain areas in volume

visualization (see Figure 2.13). Johnson and Sanderson [51] show an example

of a Magnetic Resonance Imaging (MRI) scan with an added error volume. The


error volume represents the space of possible variation and is transparent so that

the other data is still visible.

Figure 2.13: Using opacity to show the structure of uncertainty. Color scheme (left),Normal rendering (centre), Uncertainty structure (right)

Blurriness applies the same concepts as opacity to spatial mappings. In this way the

uncertainty is indicated by the imprecise position and extents of objects.

Depth can be used to indicate an order or spatial positioning for the data. Pang et

al. [86] and Brown [10] displayed intentionally different images to each eye,

exploiting a lack of binocular fusion to indicate fuzziness. Blurring or depth

of field effects from spatial frequency components being removed in the image

plane can be used to show the indistinct nature of data points [34, 64].

Texture may be applied to objects to indicate the level of precision, ambiguity or

fuzziness in the spatial location upon an object or upon a spatial location. Pang

and Alper [84] used random normal perturbation to create a textured surface.

The effect was proportional to the amount of uncertainty, creating rough regions

where the uncertainty is high. Certain shimmering effects, usually to be avoided

in visualization [117], can be used to indicate ambiguity within the region [113].

Particles can be used to represent the uncertainty of a region or object by varying

their density, opacity, and color. Grigoryan and Rheingans [37, 38] use particle

density to indicate uncertainty. These particle clouds create a similar effect to

transparent volumes. Cartography often also uses a form of this by drawing

dashed lines to represent imprecise lines and boundaries, or by using different

dot densities to represent shading effects [36].

Glyphs are the most widespread methods for displaying uncertainty. The size of a

glyph is often used to indicate a scalar measure of uncertainty. For example, error


bars are a traditional technique for indicating errors in measurement [117]. The

larger the error bar, the more uncertainty there is. This concept was expanded

upon by Pang and Freeman [85], who used the size of spherical and ellipsoidal

glyphs to indicate uncertainty in radiosity applications. Lodha et al. [67] inves-

tigated uncertainty glyphs for flow visualizations, also using length to indicate

degree of disagreement. In separate work [68] they used glyphs to show variation

between surface interpolants, finding them to be more precise than using other

features. Wittenbrink et al. [125] mapped variation in vectors to glyph length

and width, to show uncertainty in magnitude and direction. In the same work

they explored glyphs in keyframed animation to expose differences between in-

terpolation techniques.

Sonification is an approach that was explored early on. There are two main meth-

ods, one is to map the uncertainty directly to the pitch or volume, while the

second uses the degree of uncertainty to regulate a noise generator. Fisher [31]

allowed the user to scan a cursor over a landscape while the program emitted

sound depending on the degree of uncertainty. Lodha et al. [66] went further by

allowing multiple sound variables to be mapped simultaneously, thus increased

the amount of information conveyed.

These low-level features offer an added dimension to which we can map uncertainty

information for a particular plot point. Zhou and Pang [126] looked at several examples

to visualize the level of error between original and reduced resolution meshes in a

multi-resolution mesh algorithm (see Figure 2.14). We now consider how these are

used in higher level constructions and methods that require multiple data points. In our

discussion we include different spatial arrangements, use of image based techniques,

addition and modification of geometry, and the use of animation.

Figure 2.14: Some visual mappings for showing difference. From left to right: over-lay, rainbow mapping, white-black-white pseudo-coloring, glyph (hi-pass filter), glyph(low-pass filter)


2.4.2 Higher-level ConstructionsUncertainty can be represented in several ways using 2D cartesian graphs. Some ex-

amples of graphs include histograms, bar charts, tree diagrams, time histories of 1D

slices, maps, iconic and glyph-based diagrams. For example, graphs are often used to

represent the fuzzy membership functions (e.g. Figures 2.1-2.4) or proability density

functions. The structure and inter-relationships of rules can be illustrated using graphs,

trees and flowcharts.

Fuzzy rules involving two inputs can be graphed in three dimensions. Figure 2.15

shows an example from the Matlab Fuzzy Toolbox [75], where the output shows the

amount of tip, as determined by the quality of the food and service. Nürnberger ex-

plored drawing such classifiers as overlapping pyramid shapes. 2D classifiers are visu-

alized as contours for a top-down view [80], whereas 3D classifiers are 3D shapes. An

extension to this work discusses the effects that antecedent pruning has on the shapes

[81]. Pruning of antecedents involves removal of restrictive rules and simplification

of existing rules with the aim of improving the ability of the classification system to

generalize to previously unseen input data. The authors argue that rule simplifications

can have a dramatic impact on results and that visualization of these changes can pro-

vide an intuitive aid for fuzzy classifier designers. While the technique produces an

intuitive aid, the authors have not gone far enough. Since the classifier is visualized as

a shape that occupies the same space as the data, it suggests that it can be visualized

together with the data. This would allow the user to observe how the data points of a

particular data set classify, particularly when combined with animation or interactive

techniques. Possible extensions include using size, color, and translucency to enhance

perception of the classification given to a data point. Cox et al. [20] applied thresholds

to produce convex hull plots of data point clusters, using glyphs of different shapes and

sizes for the data points.

Figure 2.15: How much tip should be given based on the quality of the food and serviceusing fuzzy inference

A limitation of these techniques is that they are not well suited to multi-dimensional


data. Techniques such as multi-dimensional scaling [6] and parallel coordinates [39]

provide ways to display multi-dimensional fuzzy data in 2D without loss of informa-

tion. However, the degree of membership is not indicated in a standard parallel coordi-

nate plot. Berthold and Hall [5] use blurring to expose the level of fuzzy membership

on parallel coordinates. An alternative proposal by Pham and Brown [91] extends

coordinates to the third dimension, where the new dimension represents the member-

ship value. One technique for multi-dimensional scaling involves an algorithm that

minimizes the inter-point distances. The rule set is then visualized as a 2D scatter

plot, where grey scales denote different classes and the size of each square indicates

the number of examples [94]. Another technique for viewing high-dimensional fuzzy

rules in 2D places rules as shapes on a grid. The distance between rules in high-

dimensional space is mapped to their distance in 2D. The technique uses a gradient

descent algorithm to minimize the error between the 2D and actual distances [6].

When visualizing clusters it is often a requirement to find outliers in the data. One

method to improve the identification of outliers in fuzzy classification problems is to

modify the “objective function”. Keller proposed additional weighting parameters for

“representativeness” [55, pp. 143]. The application of this technique produces the

same principal clusters, but outliers are more easily detected since they are excluded

to a greater degree from the fuzzy clusters.

Fujiwara et al. [32] and Gershon [34] produced a 3D flowchart to represent rule

structure to facilitate understanding of rule-based programs. This is an extension of

the cone tree visualization technique [102]. Dickerson et al. [23] used a graph to

encode relationships in a complex interacting system. This technique is useful for

encoding expert information which is commonly present in fuzzy control systems.

Brown and Pham [11] extended these techniques further to by mapping additional

features to uncertainty (such as opacity) for each node.

Image based techniques can also be used to convey uncertainty. These methods are

the uncertainty analogs of image based visialization techniques such as Line Integral

Convolution (LIC) [40]. In these methods a pattern is generated that abstractly reflects

the uncertainty. One difference between image based techniques and glyphs is that

image based techniques apply a regular pattern over a continuous area. This avoids

clutter sometimes experienced by glyph techniques where the glyphs obstruct one an-

other. Sanderson et al. [104] used reaction-diffusion models in flow visualizations and

conveyed uncertainty through spot size and orientation.

Pang and Freeman [85] (see also [86]) observed that geometry can be added or

modified to indicate uncertainty. An example of modification is to create a texturing-

like effect by peturbing the orientation of faces within a geometric mesh model. The


amount of peturbation is governed by the degree of uncertainty. There are two com-

mon examples of adding geometry. The first is to add geometry for a single data point,

typically to give a direct indication of the extent over which the object can exist. An-

other is to connect successive data point extents, simulating the volume of possibility.

An example of the latter was demonstrated by Lopes and Brodlie [69], who used tubes

for particle flow visualization.

All of the low-level visual features that have been discussed can be animated. For

example, using motion blur, flickering, animated glyphs, etc. to represent the precision

of the measurements of a moving object [125, 10]. Brown [10] explored temporal vi-

brations for conveying uncertainty. The vibrations oscillate between values fast enough

to be pre-attentive [116], causing a shimmering effect that implies uncertainty about

its true position. Figure 2.16 shows frames from a movie of the luminance oscillation

technique, with a region of high uncertainty framed within the dashed rectangle. This

technique can also be applied in stereographic displays to facilitate a lack of binocular

fusion [10].

Figure 2.16: Two frames from an animation that uses a shimmering effect to indicateuncertainty by oscillating luminosity in regions of high uncertainty

Probability distributions are often graphed as 2D line graphs. Kao et al. [52, 53,

54] explored showing multiple data points, each of which is subject to a probability

distribution. In one example they overlaid the probability density functions for points

of interest, as shown in Figure 2.17. Other approaches included using color, texture,

and heightmaps to indicate uncertainty. Luo et al. [72] plotted many small histograms

in a small multiples [117] technique.

The Geographic Information Systems (GIS) field has had a particular interest in

information uncertainty visualization. MacEachren et al. [74] and Slocum et al. [109]

methodically review the state of play with respect to uncertainty visualization in GIS.

2.5 Summary 31

Figure 2.17: A visualization that draws the probability density function over associateddata points

Outside of this field, the development of visualization techniques for information un-

certainty is typically ad hoc, being created for a specific modeling technique or appli-

cation. All of these represent important steps forward, however, an integrated frame-

work to manage the modeling and visualization of information uncertainty is currently

missing.

2.5 SummaryIn summary, the information uncertainty modeling and general visualization fields

have separately been well studied. Several visualization techniques have been created

for the various information uncertainty models. However, which one is best suited to

the task at hand, and how will uncertainty modeling and propagation be tracked and

interpreted properly? What happens when the information uncertainty changes? In-

formation uncertainty modeling represents our knowledge and expectation about the

behavior of a variable under uncertainty, and this knowledge may be subject to change

over time, particularly as new information comes to light. Currently, there is no inte-

grated framework for the modeling, propagation, and visual mapping of information

uncertainty. Furthermore, there is no framework that can adapt to changes in informa-

tion uncertainty.

CHAPTER 3

Framework for Integrated Uncertainty

Modeling and Visualization

3.1 A New Approach to Information UncertaintyTraditional visualization systems, which typically do not deal with information uncer-

tainty, can still be subject to dynamic data. For these systems the dynamism refers to

changes in value. The result is that the visualization needs to be recalculated, which is

a straight forward process. Changes in information uncertainty, on the other hand, pro-

vide a unique challenge: the actual modeling technique used can change in response

to changing information. Therefore, the data type of the variable can be dynamic. The

data type refers to the form in which the information is stored and managed. This is

clearly illustrated by the case of prediction: before the event comes to pass, the un-

certainty might be modeled using a number of different techniques; once the event

has come to pass, the prediction can be updated with the actual outcome. 1Thus the

visualization must not only be recalculated, but must also adapt to this new data type.

Adapting to new data types for a visualization is not a straight forward process.

Visualization techniques are designed for a particular data type and may not support the

new data type without modification. For example, line graphs rely on a series of values

between which line segments are connected. Should the source of information be

defined by a series of intervals, then the traditional line graph is no longer appropriate.

One suitable modification turns the line segments into convex polygons, whose edges

are defined by the upper and lower bounds of the interval.

1Assuming the outcome is known, the data type then becomes one of absolute certainty.

34 Chapter 3. Framework for Integrated Uncertainty Modeling andVisualization

It is recognized that it is important that visualization systems convey uncertainty [83].

Many problems are subject to uncertainty and, as a consequence, visualization research

has produced several visualization technique modifications to support the uncertainty.

For example, transparent volumes have been added to volume renderings to indicate

potential error (e.g. [51, pp.9]), and parallel plots were extended into the third dimen-

sion to handle fuzzy variables [94]. This type of work continues, and the outcomes

continue to be data type specific.

The objective of this thesis is to integrate the process of modeling and visualizing

information uncertainty into an extensible and adaptive visualization framework. Such

a framework will provide greater uniformity for the field and enable both practitioners

and researchers to reduce the data type and visualization technique dependency. The

process that the user follows when armed with such a tool can therefore change.

The typical process that a user follows when dealing with uncertainty consists of

the following steps.

1. Decide on variables

2. Decide on uncertainty data type(s)

3. Build the data model, propagating uncertainty manually

4. Construct visualization(s), incorporating uncertainty where techniques are avail-

able and appropriate

In practice, steps 3 and 4 will be repeated as information changes. However, step 2 will

rarely be revisited. The significant point of this process is that the uncertainty model

is decided upon before the user’s data model is built. This can be unintuitive, as the

amount of uncertainty can change depending on how the model pans out. If it were

easy to add, change, or remove uncertainty details at any point in the process, then the

typical process changes, as follows.

1. Decide on variables

2. Build an initial data model

3. Construct visualization(s)

4. Add/remove/change uncertainty information

Where step 4 can occur anywhere after step 1, and can be repeated as often as is

necessary. Under such a process the uncertainty information is viewed as a refinement

on details that does not fundamentally change the data model.

3.2 Analysis of Issues and Requirements 35

This chapter describes an integrated framework for the modeling and visualization

of information uncertainty. This framework is adaptive to changes in uncertainty in-

formation, allowing the user to select the appropriate techniques for the task at hand.

In Section 3.2 we consider the issues that must be overcome, from which we derive re-

quirements of this framework. Section 3.3 describes the components of the framework

to meet the requirements. Section 3.4 provides a summary of key points.

3.2 Analysis of Issues and RequirementsThis section examines the issues that confront users when they seek to visualize infor-

mation uncertainty. From a theoretical perspective there are three main issues. Firstly,

visualization techniques are based around specific uncertainty data types. Thus, the

selection and application of visualization techniques has a tendency to be ad hoc.

Secondly, there is incoherence between information uncertainty modeling techniques.

This locks users into a particular modeling technique, the appropriateness of which

may change as the information evolves. Thirdly, information uncertainty modeling and

visualization is hampered by an artificial separation between the value of a variable and

the uncertainty model of that value. This poses problems that affect the robustness of

user models and the effort required to maintain them.

From a practical point of view, the user is required to have both a comprehen-

sive understanding of uncertainty as well as sophistication with visualization tools.

Comprehensive understanding is required, because the user must manually encode and

propagate uncertainty information; sophistication with visualization tools is required

to allow for the unusual demands of mapping uncertainty to visual elements. Many

tools lack support for information uncertainty modeling and visualization, leading de-

termined users to cobble together multiple tools.

3.2.1 Ad hoc Visualization TechniquesSensemaking Cycle and Changes in Uncertainty Information

Visualization is “the bringing out of meaning in information” [56]. It is performed it-

eratively and usually as part of the sensemaking cycle [103, 17]. The iterative looping

is not exclusive to mapping data into visual form; instead, users sometimes return to

the data model to gather or transform data. This is particularly true for information un-

certainty. For example: uncertainty details can be deemed to be more important later,

once the basic model is in place; or the uncertainty details may change as more be-

comes known about the variables. Therefore, frameworks for information uncertainty

visualization should ideally allow the user to go back to make changes with minimal

effort.


Flexibility

Visualization of information uncertainty is different to visualizing other forms of in-

formation for two main reasons. Firstly, information uncertainty is always associated

with a particular unit of information. This means that the uncertainty cannot be freely

visualized without regard to its interpretation relative to the information to which it

belongs. Secondly, information uncertainty is usually mapped differently to visual el-

ements. For example, uncertainty is commonly mapped to intrinsic properties, such as

transparency or color; or by adding a dimension to geometry, such as using a surface

where there would otherwise be a line. Therefore, a visualization system for informa-

tion uncertainty requires the flexibility to allow users to map uncertainty to compound

visual elements, including intrinsic properties and adding dimensions to geometry.

Figure 3.1 demonstrates how information uncertainty is associated with informa-

tion, but typically mapped differently to visual elements. Four graph visualizations of

historical and predicted employment rates in California are shown. The first graph (a)

assumes that growth will continue at the average growth rate of the past 15 years and

is therefore visualized using traditional means. While the information in graph (a) is

modeled as not being subject to uncertainty, it requires the unreliable assumption about

employment rates to be made. The graph in (b) estimates that the growth will continue

at the average rate. The fact that the predictions are estimates is indicated by the line

stippling, an intrinsic property of the line. The graph in (c) shows the possible range

within the maximum and minimum growth rates experienced in the past 15 years. The

uncertainty is indicated by extending the one dimensional line into a two dimensional

polygon. The graph in (d) uses a normal distribution centered on the average growth

rate. The uncertainty is indicated by both extending the dimensionality of the line as

well mapping to the intrinsic property of opacity.

Heterogeneity in Uncertainty Information

Several uncertainty visualization techniques have been developed for particular uncer-

tainty types. However, in an environment where the uncertainty type can change to bet-

ter suit the needs of the user, such restrictive preconditions for visualization techniques

provide a return to the tyranny of uncertainty type lock-in. Therefore the approach to

visualization of information uncertainty requires visualization to provide greater con-

sistency across different uncertainty modeling techniques.

Homogeneous Access

To enable the visual mappings that expose the uncertainty in variables, it is necessary

to have access to the associated uncertainty details. However, there are numerous un-

certainty modeling techniques that use different methods for encoding the uncertainty.


(a) (b)

(c) (d)

Figure 3.1: Visualizations of Employment Numbers in California. Years 2005-2010are predicted. (a) Assuming Average Growth (b) Indicating Growth is Estimated (c)Possible Growth (d) Likely Growth. (Data Source: California Employment Develop-ment Department)


This creates a barrier to visualizing uncertain information because visual mappings

that work with one uncertainty modeling technique may not be easily transferable to

another. Such inconsistency creates a strong dependency between visualizations and

the data types used in the model, limiting the user’s ability to update the data model.

Therefore, a generalized means for accessing uncertainty information should be sought

to enable a consistent environment information uncertainty visualization.

Plurality of Values

Fundamental to the concept of information uncertainty is the ability for a variable to

hold multiple values simultaneously; in other words, the variable has multiple possible

collapses. This plurality of values represents the deferral of the approximation decision

- the true value of a variable may be one of multiple candidates, each of which should

be considered a possibility.

3.2.2 Incoherence of Uncertainty ModelsUncertainty Data Type Lock-in

There is usually no support for changing from one uncertainty modeling technique to

another. Adding uncertainty information to data allows the user to specify a greater

level of detail about the data. However, changing the uncertainty data type typically

requires users to reconstruct the affected portion of the data model, often involving a

fundamental change in form. This makes the data model rigid and, as a consequence,

users will typically need to anticipate their use of uncertainty and build their model

accordingly.

Manual Propagation of Uncertainty

Since it is usually up to the user to manage and interpret uncertainty parameters, the

use of information uncertainty requires the user to have a mathematical understanding

of modeling techniques. This would explain why, although uncertainty modeling is

common in disciplines such as engineering and physics, it is often under-used in other

domains. While only some understanding is required when declaring the uncertainty,

the subsequent propagation of uncertainty, due to interaction between uncertain vari-

ables, requires more detailed understanding of the mathematical principles involved.

To ease this burden, the system should facilitate the automatic propagation of uncer-

tainty.

There are different mathematical models that are available for the propagation of

uncertainty under the various uncertainty data types. For some applications it is im-

portant that a particular mathematical model be used and therefore the user must be

capable of specifying the model to be used. In most environments it is up to users to


implement the correct model. However, any automatic propagation system must also

facilitate this choice.

Propagation in Heterogeneous Operations

The automatic propagation system should also cope with the situation where an op-

eration combines two variables of different uncertainty data types. One solution is to

comply with the principle of requisite generality [60], where variables are converted

into an uncertainty type general enough to express the resulting uncertainty. However,

there may be multiple methods for conversion. The user may wish to provide a specific

mathematical model that is better suited to their domain, or they may even wish to dis-

card some uncertainty information as a simplification. Therefore, the propagation of

uncertainty in heterogeneous operations should also facilitate choice of mathematical

model.

3.2.3 Artificial Separation of Information and UncertaintySeparability of Parameters

The closely related parameters of the uncertainty data type are often treated as separate

variables. For example, rather than declaring a variable as being modeled using a prob-

ability, many environments require separate variables for mean and variance. Thus, if

α were a variable subject to a normal probability distribution, it is stored as two sepa-

rate variables: αμ and ασ . This lack of structure is akin to use of the “go-to” directive

before the advent of structured programming, because the burden is upon the user to

treat these variables as being connected. This separation has two significant ramifi-

cations: firstly, it is easier to introduce errors since the environment does not enforce

any semantic properties of the uncertainty parameters; and secondly, the introduced

complexity discourages users from using uncertainty modeling techniques. Therefore,

the uncertainty parameters should be treated as part of a unit and the system should

enforce the semantics of the parameters.

Model Rigidity

For users to visualize information uncertainty, that uncertainty must be declared some-

where. The declaration of the information uncertainty should be co-located with the

information to which it relates, since the two are fundamentally connected. However,

this relationship is neglected in most environments, which instead require the user to

declare the parameters of the uncertainty separately from the variable. For example,

the variable α might have the value 5, but another variable, αcon f idence is required to

hold information about the certainty of α . This results in an added layer of complex-

ity and the user is faced with an increasingly intricate data model. Furthermore, there


is often no support for changing from one uncertainty modeling technique to another.

Changing the uncertainty modeling technique typically requires the user to reconstruct

the affected portion of the data model. This makes the data model rigid and, as a con-

sequence, the user will typically need to anticipate their use of uncertainty and build

their model accordingly. The user is faced with a progressively more intricate data

model that becomes increasingly error prone.

3.3 Components of the FrameworkSevereal components are required to build an integrated uncertainty modeling and vi-

sualization framework. Figure 3.2 illustrates the components of the framework and

the chapter that describes each component in detail. The three main components are

the spreadsheet paradigm for information uncertainty, uncertainty encapsulation and

automated propagation, and uncertainty abstraction for visualization.

Chapter 3 - Framework

Chapter 4Spreadsheet Paradigm for Information Uncertainty

Chapter 5 Uncertainty Encapsulationand Automated Propagation

Chapter 6Uncertainty Abstraction

for Visualization

Uncertainty Encapsulation

Uncertainty Propagation Models

Uncertainty Abstraction Models

User-objectives based Visualization

Figure 3.2: Description of the framework

Features of the framework are:

• the ability to change uncertainty details in the model whenever necessary.

• does not require the user to be intimately aware of the uncertainty modeling

techniques - system manages propagation and avoids basic mistakes

• built-in support for visualizations that incorporate information uncertainty into

their display

3.3 Components of the Framework 41

• built-in support for modeling of information uncertainty

• extensibility: practitioners can add new data types and visualization techniques

in future

We now discuss overviews of each of the three components.

3.3.1 Spreadsheet ParadigmVisualization systems are commonly based on a data-flow network paradigm. How-

ever, the spreadsheet paradigm has been shown to offer advantages for visualiza-

tion [45, 46]. Features of the spreadsheet are that it visually lays out documentation,

intermediate data, and model logic for inspection. The data contained in a spreadsheet

is always up to date, which offers an intuitive view of the supporting data used in a vi-

sualization. Furthermore, spreadsheets are widespread for modeling tasks and widely

understood. The framework described here builds on the spreadsheet paradigm to take

advantage of these features.

A fundamental extension that we make to the spreadsheet paradigm is to support

uncertainty information at the sub-cell level. Figure 3.3 shows the new conceptual

structure. Workbooks are the top level objects and typically there will only be one

workbook open at a time. The workbook contains several spreadsheets, which are

a matrix of cells. Each cell contains a unit of data, typically either a text label, a

formula, or a numeric value. The cell types are extended such that each cell may

include uncertainty details.

Workbooks

Spreadsheets

Cells

Uncertainty Parameters

have

have

have

Figure 3.3: Additional Layer in Spreadsheet Hierarchy

The use of formulae creates functional relationships between cells. Formulae tra-

ditionally operate at the cell level and thus require two extensions to operate in the new

environment. Firstly, the execution of formulae needs to take uncertainty details into

account. This is mostly transparent to the user. Secondly, we introduce new functions

that allow the user to programatically access uncertainty details of cells. These func-

tions enable users to pack and unpack uncertainty information contained in cells using

formulae.


Figure 3.4 shows a prototype implementation that demonstrates the application of

this framework. Information uncertainty can be entered into cells and it is automati-

cally propagated through the formulae. The currently highlighted cell E4 contains a

formula that combines an interval with a known quantity, resulting in an interval. A

similar formula is repeated for all cells in the range B4:F7.

3.3.2 Uncertainty EncapsulationThe freedom to choose the appropriate modeling technique for the uncertainty informa-

tion at hand requires an ability to change techniques with minimal disruption. We treat

the uncertainty information as intrinsically connected to the unit of information, an

approach we call uncertainty encapsulation. This allows the user to specify variables

and their uncertainty such that the system is intrinsically aware of the relationship.

Two components are required to facilitate the adaptive modeling characteristics.

The first is an ability to enter variables as uncertain quantities, which uses encapsula-

tion. The system enables the user to specify the modeling technique and uncertainty

parameters within a single cell. Since this information is treated as a unit, the sys-

tem can protect and manage that information. The second component is a suite of

propagation methods that dictate the way uncertainty information is carried forward

through operations. The method is indexed by the desired operation and the types of

parameters. The formula system invokes the appropriate handler using a look-up table.

The screen shot in Figure 3.4 shows several different uncertainty modeling tech-

niques being used to model the variable change. At any time the user can navigate

to any of the input variables and modify them to contain any supported type of un-

certainty. The uncertainty is then automatically propagated to the other cells by the

formulae. Thus, the system adapts to the modeling technique automatically.

3.3.3 Uncertainty AbstractionUncertainty data type abstracted visualization ensures that the visualization can cope

with changes in the uncertainty of information. Visualization techniques are built for

generic abstract types and will therefore no longer be bound by the underlying data

type.

To achive this we introduce two main uncertainty abstraction models, which spec-

ify the interface to uncertainty information. The first is the Unified Uncertainty Model,

which does not distinguish between different types of uncertainty. This model is sim-

plistic in nature but provides sufficient detail to produce all of the visualizations listed

in the background Section 2.4. The second is the Dual Uncertainty Model, which dis-

tinguishes between possibilitic and probabilistic views. This model can be used when

3.4 Summary 43

such a distinction is required by the domain or task. Further abstraction models are

possible, which we briefly cover in Chapter 6.

Traditional approaches to visualization are primarily data type centric. This clearly

is not an appropriate approach for data type abstracted visualization. Task-driven ap-

proaches have recently appeared (e.g. [11, 57, 82]), where visualization critera are

derived from task requirements. The main drawback of these approaches is that they

tend to be application domain specific and therefore lack generality. We argue for a

User-objectives based approach, which is data type independent and not application

domain specific. We provide categories of user-objectives for information uncertainty

visualization. Each objective seeks to highlight a particular aspect of the information

uncertainty according to the insight that the user is trying to gain. We also present an

algorithm for eliciting objectives from the user.

3.4 SummaryThis chapter outlined a new approcah to information uncertainty modeling and visu-

alization. We provided an analysis of the issues that need to be addressed, which fit

into three main categories. Firstly, current visualization techniques tend to be ad hoc.

Secondly, different uncertainty models are not necessarily consistant with one another.

Thirdly, there exists an artificial separation between variables and their uncertainty de-

tails. This separation effectively promotes uncertainty details to the same level as the

data to which they belong, increasing the potential for user induced error.

We presented a framework that addresses these issues by taking an encapsulation

and abstraction approach to provide an integrated framework. The framework con-

sists of three main parts. The first is an extension to the spreadsheet paradim as a

visualization platform for information uncertainty. The second is uncertainty encapsu-

lation, which treats information and its uncertainty as a unit. The third is uncertainty

abstraction, which enables visualizations to be built independently of the uncertainty

modeling technique used in the data. Each of these components are detailed in the

following chapters.


Figure 3.4: Screenshot of the Prototype System

CHAPTER 4

Spreadsheet Paradigm for

Information Uncertainty

This section describes an integrated visualization and modeling system design that uses

a spreadsheet paradigm. This system integrates the modeling and visualization tasks,

allowing a tight feedback loop between visual inspection and data model building.

4.1 Motivation and ObjectivesThe relationship between spreadsheets and other approaches to data models can be

illustrated using a formal definition. Spreadsheets consist of four components [48]:

the schema, a definition of the spreadsheet logic; the data, which are the instance

values for this spreadsheet; the editorial, which consists of headings, borders, etc.; and

binding, which is the mapping of the content to the tabular structure of cells. It is the

binding property that is responsible for the tabular layout of a spreadsheet.

The spreadsheet paradigm allows a great amount of freedom for users to organize

their information. The freedom to quickly perform experimental calculations that do

not affect the rest of the data model facilitates exploration tasks. However, the draw-

back of this freedom is that spreadsheets can be error prone. The fact that spreadsheets

are so widespread and yet capable of errors has motivated much research into spread-

sheet testing methods [9, 30], particularly where spreadsheets are used for financial

decisions.

The terminology used in this section is as follows. The workbook is made up of

sheets. A sheet is a heterogeneous sparse two-dimensional grid of cells. Sheets are

46 Chapter 4. Spreadsheet Paradigm for Information Uncertainty

also theoretically infinite, but practically constrained due to resource limitations. The

heterogeneity refers to the ability to have cells of different types within the same sheet.

They are sparse because cells can be empty. A cell is an addressable spatial location

that contains a unit of information. We use the term uncertainty spreadsheet to refer to

any spreadsheet that includes information uncertainty and reserve the term uncertainty

visualization spreadsheet for spreadsheets that include both information uncertainty

and visualizations.

Our approach integrates both the visualization and modeling tools into a single

system. Spreadsheets are ideal for this because they are interruptible, widely under-

stood, and in a constantly running state. The interruptible characteristic allows the

user to move to another location in the spreadsheet to experiment, without interfering

with their main task. They are widely understood by users because spreadsheet use is

ubiquitous, especially in the financial modeling field. Finally, unlike scripts that must

be run before they produce results, a spreadsheet is constantly in an up-to-date state,

allowing it to be easily interrogated and refined.

There are numerous information uncertainty modeling and display techniques, and

new ones continue to be developed. Therefore a plug-in based architecture is used to

allow new uncertainty data types and display techniques to be added to the system. Fur-

thermore, there are multiple mathematical models for the propagation of uncertainty.

Users require an ability to choose appropriate functions for their task and the system

will need to present the user with options that are semantically valid. Therefore, the

plug-in system also allows new mathematical models to be added.

To support the needs of visualizing information uncertainty, the system must be

capable of mapping uncertainty information to intrinsic properties and geometric ex-

tensions, in addition to stand-alone visual elements. To provide flexibility, the sys-

tem should allow the user to build the visualization using as many visual elements as

needed. The visual elements can then be provided through the plug-in system to allow

extensibility, particularly since new display techniques continue to be developed.

4.2 Related Work on SpreadsheetsThe spreadsheet paradigm is widely understood for managing numerical information,

prompting researchers to explore other uses. An early proposal to generalize spread-

sheets is the Analytical Spreadsheet Package [95] (ASP), which allowed any Smalltalk-

80 object to be placed inside a cell and used Smalltalk messages as formulae. While

this provides flexibility, it is too general and complicated for non-expert users to under-

stand. However, ASP did anticipate many of the ideas that were explored in subsequent

papers, such as widgets, which are available in Spreadsheets for Images [65] (SI). SI

extends spreadsheets to include graphical objects, including several different widgets

4.3 Architecture and Features 47

and images. Further, SI takes the unusual step of allowing formulae to write their re-

sults to a different cell. While this offers flow control, it can complicate the user’s

interpretation of the spreadsheet.

FINESSE [123] specifically targeted real-time financial information, adding im-

ages, heat maps, and graphs to the regular cell types. FINESSE introduced “pre-

sentation relationships”, where groups of cells have access to common presentation

attributes. This provides for shared memory that is not shown in a cell. However,

other systems (including SI) achieve a similar effect by storing presentation attributes

in cells.

The Spreadsheet for Information Visualization[45, 46] (SIV) explored more gen-

eral visualization, building on the Visualization Toolkit [105] (VTK). Each cell in SIV

can contain a visualization, including the data sets used to drive the visualization. Vi-

sualization related operators are available and can operate on multiple cells, such as

a whole column. SIV is motivated by the ability to compare visualizations side-by-

side, particularly to see incremental changes, which is referred to as “small multiples”

after [118, pp.67]. Further, a key advantage of spreadsheets is to use templates for

analysis and experimentation. While suited to visualization tasks, SIV is not partic-

ularly suited to modeling as it is optimized for fewer cells containing larger data-sets

and has dispensed with traditional text and numerical cells.

VisTrails [4, 13] specifically uses a spreadsheet for displaying multiple visualiza-

tions for side-by-side exploration. In this sense the term spreadsheet refers to the

tabular appearance rather than any ability to create formula driven relationships. Sim-

ilarly, tabular visualization methods, such as Hyperslice [122] and TableLens [98],

share some similarity to spreadsheets. However, traditional spreadsheets are sparse,

allow a mix of cells, and offer inter-cell dependencies through formulae. These prop-

erties lend to a paper-likeness that separates spreadsheets from regular tabular displays.

4.3 Architecture and FeaturesThe goal of this approach is to directly support uncertainty information within spread-

sheet cells, in a managed and extensible way. The process for construction of a spread-

sheet using this approach is described later in Section 4.4.

There are two fundamental extensions to traditional spreadsheets that provide for

uncertainty visualization spreadsheets: uncertainty encapsulation, where the uncer-

tainty details are kept together with the information as a unit; and uncertainty ab-

straction, which enables homogeneous visual mapping of uncertainty details despite

differences in modeling techniques.


4.3.1 Uncertainty EncapsulationWe extend the traditional spreadsheet to include novel cell types that facilitate infor-

mation uncertainty modeling techniques. There are a number of different modeling

techniques, but they are all fundamentally similar in that they describe the uncertainty

landscape surrounding a value. Thus, they should be compatible with one another and

should be able to interact using formulae.

The basic data types commonly found in a spreadsheet are shown in Figure 4.1.

ICell is the interface that all cell types must implement. Empty is a cell type that

exists to hold formatting information, such as borders and changed background color.

Label cells hold arbitrary text strings, usually used for documenting the spreadsheet.

Quantity contains a constant number, as entered by the user. Formula cells contain a

formula that is evaluated to produce a result. The result of a formula is either a Label,

Quantity, or Error cell type, which is displayed in the spreadsheet.

Quantity

ICell

Label FormulaEmpty Error

Figure 4.1: Basic Cell Type Object Hierarchy

We introduce novel cell types that add uncertainty details to the cell. The novel cell

types model a quantity with added knowledge about its uncertainty They are therefore

derivatives of the Quantity cell type. We consider the traditional Quantity type to

describe a variable with uncertainty ignorance, since there is no associated uncertainty

information. This means that the variable may or may not be subject to uncertainty,

but the system has no evidence either way.

Examples of common novel cell types include: KnownQty, which represents a

quantity that is known to be exact; Estimate, where the quantity is known to be an

estimate; Interval, representing a continuous interval; and Gaussian, representing a

normal distribution. The hierarchy for these novel cells is shown in Figure 4.2.

Quantity

Estimate Interval GaussianKnownQty

Figure 4.2: Novel CellType Object Hierarchy

Each cell type has a distinguishing string format by which the user can input it into

the system. Our prototype uses the formats listed in Table 4.1 to determine the type of

cells. If none match then the input is assumed to be a Label type, which is capable of

holding any string value. The string is then parsed by the appropriate handler, which


generates a cell object. For example, the string “10+-2” will be processed by the

interval component of the uncertainty plug-in to produce an interval of 10± 2. The

resulting cell object is inserted into the current sheet at the current cursor location.

Type Format Example

Formula = Expression =A1+6Label ’String ’12.06Quantity Number 12.06

KnownQty Number# 12.06#Estimate ~Number ~12.06Interval Number +- Number 12 +- 5.0e-10

[ Number , Number ] [1, 2.1][ Number .. Number ] [-15..15]

Gaussian Number @ Number 10.1 @ 2.178

Table 4.1: Format of Cells in the Prototype System

Formulae are identified by a leading equals symbol. The remainder of the string is

parsed and converted into a sequence of function calls, with infix operators (e.g. “+”)

being converted into functions (e.g. “add”). Function tables are used to invoke the

appropriate function for the parameter types, which mirrors operator overloading in

traditional programming languages. The use of formulae creates functional relations

between cells and from these relations a dependency tree is built. The dependency

tree lists the cells that directly depend upon a particular cell. It is a stipulation of

spreadsheets that there cannot be any circular references as this would create an infinite

loop. When a user completes updating an existing cell, the system recalculates any

affected cells. Affected cells are determined by walking the dependency tree, starting

with the current node. If the current node is not a member of the dependency tree then

no other cells need updating.

The display of uncertainty details within each cell can lead to visual clutter. To re-

duce clutter, the system has an option to hide the uncertainty details and instead display

a representative value in the cell. Representative values are estimates of what an uncer-

tain variable might actually be. This works well because the user will typically think of

the cell contents as the representative value. Figure 4.3 shows a screen-shot of the pro-

totype system where uncertainty hiding has been enabled. The currently highlighted

cell shows the value 171.51, whereas the cell contents is actually “normpdf(175.51,5)”.

This behavior parallels formulae, which display the result of the calculation in the cell

rather than the formula itself. The prototype automatically shades cells to help identify

those that are uncertain. The default colors were chosen to contrast with one another

while remaining subdued. They are user configurable and automatic coloring can also

be disabled.


For many common uncertainty types (e.g. estimates, intervals, gaussian proba-

bilities) the representative value is straight forward. However, in the case of unusual

uncertainty models this may not be the case. For example, consider a variable that can

only be either 3 or 4, but no other value. The representative value should therefore not

be 3.5, as this would be misleading. We define the principle of representation, which

states that the representative value must be a valid possibility for the variable. The

choice of which value to show depends on the implementation and might be guided by

application domain needs.

4.3.2 Uncertainty AbstractionThe visualization system is implemented as a specialized sheet, called a visualization

sheet. The layout of the visualization sheet matches the scene graph structure [28],

with every non-empty row of the visualization sheet representing a node in the graph.

The starting column indicates the position of the node in the hierarchy, where a row

that begins in column n will be a child node of its nearest preceding row that begins in

columns 1 ..n−1, or the root node if none can be found. Thus, all rows beginning with

the column “A” will be children of the root node, while rows beginning in column “B”

will be children of a preceding row beginning in column “A,” etc. The first non-empty

column contains a string key that determines the node type. The subsequent columns

contain the parameters for that node. The nodes objects manage their own scene graph

nodes. It is the responsibility of the node object to perform type checking on their

parameters.

Figure 4.4 shows a visualization sheet with three children of the root node: a Title

node, which displays title text; a 2Daxes node, which generates a rectangular grid; and

a Scale node, which adds a scaling to the transformation matrix of its children. The

2Daxes node has two children, which specify the labels for each axis. The Scale node

has a Translate child, and together they position the data correctly over the 2Daxes

object. The Color node specifies that its child should be drawn in blue. The AreaLine

node takes a sequence of lower and upper y-values and produces the polygon repre-

senting the data. All parameter fields can either contain a constant value or a formula.

In Figure 4.4 the AreaLine node is a child of the Color node, which is a child of the

Translate node, which is itself a child of the Scale node. This construction was used so

that the AreaLine would be scaled, positioned, and colored. The first parameter to the

Xaxis and Yaxis nodes determines the normalized position of the origin. Subsequent

parameters contain the axis labels.

Typical spreadsheet languages contains basic operators (such as addition, subtrac-

tion, multiplication, and division) and usually a wealth of commonly used functions

(such as statistical and financial functions). To support information uncertainty, the


Figure 4.3: Screen-shot of the Prototype

Figure 4.4: Visualization Sheet for the Graph in Figure 4.7


language needs to be extended to allow access to the underlying uncertainty informa-

tion. Two categories of new functions are added. The first category contains the con-

version operators, which convert from one type of uncertainty model to another. The

second category involves interrogation of the uncertainty details, which is intended to

be used when mapping to visual elements. The prototype system provides four such

interrogation functions, listed in Table 4.2.

Function Returns

isCertain(x) true if the cell x doesn’t have associated uncertaintyLower(x) The lower bounds of the cell x.Upper(x) The upper bounds of the cell x.Certainty(x, y) The generalized certainty of the cell x being y, where y

is either a constant or another cell.

Table 4.2: Prototype Uncertainty Interrogation Functions

The mathematical model for managing the propagation of uncertainty through op-

erations is user selectable. Our approach is to use a function table, where the system

can look up the appropriate function to handle the requested operation. Plug-ins reg-

ister functions, which have a signature based on the operation and types of the param-

eters. Multiple functions can be registered for the same {operation, parameter types}

signature, but only one can be active at any given time. This includes both standard

functions and the uncertainty interrogation functions in Table 4.2.

4.4 New Process and WorkflowWe argue for an incremental process to building uncertainty spreadsheets. It is typ-

ically most convenient to begin construction of a model at the high level, where the

focus is on logic, before proceeding to add details. The use of uncertainty modeling

increases detail and is therefore typically most conveniently added later, once the basic

structure of the model is in place.

Figure 4.5 shows the process for building an uncertainty spreadsheet. In the first

step an initial spreadsheet is built, typically using uncertainty ignorance data types.

This step is similar to building a traditional spreadsheet, and includes the spreadsheet

structure, variables, and formulae. Next, the spreadsheet is iteratively refined in three

ways: firstly, uncertainty details are added, changed, or removed from variables; sec-

ondly, visualizations in the model are added, altered, or removed; and thirdly, the

model can be refined in the traditional sense, such as changing formulae or adding

variables.

The task labeled refine uncertainty details consists of two types of activities: adding,

changing, and removing uncertainty detail, and changing the mathematical model for

4.5 Capabilities and Advantages 53

Build Initial Spreadsheet

Refine Spreadsheet

Refine Uncertainty

DetailsRefine ModelRefine

Visualizations

Figure 4.5: Process for Constructing an Uncertainty Spreadsheet

propagation. There are three main steps to add, change, or remove uncertainty de-

tails: firstly, the appropriate variable is identified, e.g. a variable whose uncertainty

is currently ignored; secondly, its details are changed; and thirdly those changes are

evaluated, returning to the second step if found to be inadequate.

4.5 Capabilities and AdvantagesThe system described here extends the spreadsheet paradigm using a plug-in archi-

tecture to allow arbitrary cell types. The uncertainty plug-in implements information

uncertainty modeling techniques. This allows users to model a system using informa-

tion uncertainty as a native type. The parameters of the uncertainty are inherent in the

cell, providing structural and semantic support for the uncertainty modeling technique,

thereby avoiding potential errors that can arise when parameters are separated.

Conversion operators are supplied by the plug-ins that allow the user to convert

a cell of one type into another. The user is able to choose the mathematical model

that they wish to use from options supplied by plug-ins. The user is now free to itera-

tively build uncertainty into a model: first, a rough crisp data model is produced as a

proof-of-concept; then, the user refines the data model by promoting variables to add

uncertainty detail. The system handles propagation of the uncertainty automatically

and each variable with uncertainty is treated as a unit of information.

The use of a visualization sheet keeps the interface consistent and brings the power

of formulae to the visual mapping process. Through the combinations of several visual

elements, sophisticated visualizations can be constructed by the user. This provides the

flexibility to perform traditional visualization tasks as well as supporting the sometimes

unusual needs of information uncertainty visualization.

Comparison to Traditional Spreadsheets

The method for incorporating uncertainty in a traditional spreadsheet consists of three

major changes. Firstly, the uncertainty details must be recorded in the spreadsheet


somewhere, resulting in additional cells being used. The addition of new cells changes

the layout of the spreadsheet and increases the amount of information that the user

faces. Furthermore, the number of cells that are added depend on the number of pa-

rameters required by the uncertainty data type. Secondly, formulae need to be changed

to incorporate the propagation of uncertainty details. These formulae become harder

to understand, because the uncertainty information handling obscures the fundamental

operation. The uncertainty information propagation must also be carried forward to

all downstream formulae, which can be many. Thirdly, any graphs or visualizations

should be updated to include uncertainty information as appropriate.

There are four limitations to traditional spreadsheets for incorporating uncertainty.

Firstly, the user is required to be intimately aware of the uncertainty modeling tech-

nique, including rules for its propagation, before they can incorporate it in their model.

Secondly, adding uncertainty information after the model is already in place becomes

an arduous task that is error prone. Should more information come to light, for which

another uncertainty modeling technique is more appropriate, then all affected parts of

the model have to be manually rebuilt. Thus, it is prohibitive to change the level of

uncertainty information after the initial design. Thirdly, changing propagation rules

requires all formulae to be rewritten. Thus, it is also prohibitive to change the math-

ematical propagation model. Fourthly, there are currently few built-in visualization

techniques for information uncertainty. The visualization techniques that are supplied

target specific uncertainty modeling types (e.g. intervals). Creating sophisticated visu-

alizations typically requires users to export their data to a more advanced visualization

system.

Our system overcomes these four limitations. In contrast to adding new cells to

the spreadsheet, our approach is to store the uncertainty information in the same cell.

The immediate advantages of this approach are that the spreadsheet does not change in

layout and the number of cells do not increase, irrespective of the type or volume of un-

certainty information it contains. Furthermore, the system is aware of this uncertainty

information and has a mechanism for resolving appropriate propagation operations in

formulae, meaning that formulae do not change. Thus, to add, change, or remove un-

certainty for a variable is a local change to a single cell. Exceptions only occur where

the user’s chosen mathematical model prohibits particular operations or combinations,

which is no different to any traditional approach. The system resolves operations using

a table of operations that the user can control at a global level. Therefore, should an al-

ternative mathematical model be needed, no change to the actual spreadsheet contents

is required.

4.6 Case Study: Financial Decision Support 55

Our system uses a flexible visualization sheet that allows sophisticated visualiza-

tions to be explored. The advantage is that any changes to the spreadsheet are immedi-

ately reflected in the visualizations. The next section illustrates these advantages when

applied to a financial model.

4.6 Case Study: Financial Decision SupportThis section describes the advantages of using our architecture over a traditional spread-

sheet. It also illustrates how an uncertainty spreadsheet is constructed and used through

a case study. The problem to be explored is understanding and visualizing the prof-

itability of a prospective investment property. Acquiring property for investment and

rental income is a common prospect for many who may not have a background in fi-

nance. However, there are many estimations and subtle interactions between variables

that can have significant effects on the profit outcomes. Furthermore, many of these

interactions are poorly understood or difficult to define, even for experts.

The decision to acquire is based on profitability of the investment. Therefore, the

output of the model is a Net Present Value (NPV) calculation that gives a comparison of

the profitability of buying a property using a deposit against investing that same deposit

into a fixed interest vehicle. A positive NPV indicates that the property investment is

more attractive.

The NPV calculation is as follows:

NPV =t

∑n=1

CashFlow(n)(1+ in)n

where t is the number of years the property is held; in is the after tax interest rate

in year n; and CashFlow is given by

CashFlow = r− p−o+ x−CI +CO−u

where r is rental income; p is the loan payment for the current year; o is the ongoing

expenses; x is the tax refund due to investment; CI is the deposit paid on purchase; CO

is the deposit + net profit on sale; and u are upfront costs.

Building the Initial Spreadsheet

If uncertainty is ignored, then the common approach to this problem is to create a tab-

ular spreadsheet: each column contains a variable and each iteration of n adds another

row. A summary page is created where input variables, such as increases in salary,

can be placed in an accessible location. The user is able to change input details and

observe their effects over a number of years, usually with the aid of graphs.


Most of the variables in this model are subject to uncertainty. For example: rental

income becomes progressively less certain the farther into the future it is predicted;

loan repayments are similarly uncertain since they are dependent on a variable interest

rate; the tax refund is uncertain because it depends upon taxation law, employment

status, and promotions, all of which can change unexpectedly; and the net profit on

sale is always subject to uncertainty.

Spreadsheet Layout is Unchanged

Adding uncertainty details using our system does not add new cells or change the

spreadsheet layout. Figure 4.6 illustrates this by using the annual salary increase pro-

jected over 20 years in the prototype software. There are four variables: salary growth,

salary, tax, and net income. Uncertainty information propagates from salary increase

to salary to tax. The user wishes to model the salary increase as an interval of 7±1

(6 to 8). Figure 4.6 (a) shows the original spreadsheet model prior to modeling an

interval. Figure 4.6 (b) presents a solution using a traditional spreadsheet approach,

which requires six columns to represent three variables. Each column had to be manu-

ally added and the formula for tax and salary had to be updated to reflect this change.

Figure 4.6 (c) shows our prototype system with uncertainty hiding switched on. The

salary growth field was promoted to an interval (7±1) and no other change was made.

In this view the updated model closely reflects the original1. Figure 4.6 (d) is the

same as Figure 4.6 (c) with uncertainty hiding switched off, thus showing the same

information as is found in Figure 4.6 (b).

(a) (b)

(c) (d)

Figure 4.6: Interval Modeling Example: (a) Original Model (b) Traditional Spread-sheet (c) Prototype System Uncertainty Hidden (d) Prototype System UncertaintyShown

1Note that the number shown represents the halfway value between the upper and lower limits. Theupper limit grows more rapidly than the lower limit, thus the mean value for [6,8]%growth will notmatch 7% growth.


Formulae are Unchanged

The shaded cells indicate that they contain an interval, thus it can be seen from Fig-

ure 4.6 (b) that the uncertainty is propagated automatically to both salary and tax. The

formulae for these cells, however, are unchanged. It is noteworthy that while the figure

shows the representative value in each cell, the user can always toggle the viewing op-

tion to show the uncertainty details instead of the representative value. The traditional

approach not only changes layout, but the formulae had to be repeated to calculate both

the low and high rates.

To achieve the same effect using a traditional spreadsheet requires more effort.

Firstly, each affected variable must be expanded to two cells, namely the upper and

lower bounds. This typically involves adding an additional column for each variable

that is calculated over multiple years. Secondly, the propagation of the uncertainty

information must be manually managed by adding the appropriate formulae.

Visualization can be Abstracted from Uncertainty Type

Using the traditional spreadsheet limited the graphs to those that the program provided,

of which two could be used to indicate the intervals. The first was a graph that used

error bars, while the second was to overlay the maximum and minimum lines on the

same graph as two different data points. However, these traditional graphs only work

with interval data. In contrast, the graph in Figure 4.7 will work with the other data

types.

The graph in Figure 4.7 was generated using three elements in a visualization sheet:

a title text object, a 2D axes object, and a polygon. The 2D axes object takes as param-

eters the label and range for the vertical and horizontal axes. The polygon requires a

color specified in the first four cells, followed by a series of alternating x and y coor-

dinates. The y coordinate is given by firstly the lower bounds of the variable, then the

upper bounds, using formulae of the form “=Lower(cellref )”, where cellref is a refer-

ence to a cell containing NPV for the appropriate year. These functions are defined for

all numerical types. For example, the Upper() and Lower() functions return the same

value when that value is certain, resulting in a line graph.

Changing Uncertainty Models is Easy

The user can choose to use the modeling technique that is appropriate for the vari-

able, with little regard for how the rest of the data is modeled. The interest rates are

unlikely to change maximally and more likely to stay even. Therefore, it is desirable

to model the changes in interest rates as a probability distribution. To model this we

choose a Gaussian distribution centered on no change. Using our system, the user


Figure 4.7: Using An Interval (±0.5) for Annual Change in Interest Rates Propagatesthe Uncertainty to NPV

simply promotes the annual change in interest rates to a Gaussian probability distri-

bution. As with intervals, the uncertainty will be automatically propagated through to

NPV. If multiple uncertain variables interact, our system automatically manages their

combined uncertainty information.

In contrast, the traditional spreadsheet requires more work to achieve the same

effect. Each variable now requires two cells of a different sort: the first cell to contain

the mean, and the second cell for the variance. Every formula that was previously

written to handle the intervals must now be changed to handle normal distributions,

which requires both mathematical competency as well as care to avoid introducing

errors. If multiple uncertain variables interact, then the formulae must be painstakingly

integrated.

Flexibility for Sophisticated Visualization

The ability to use multiple visual elements, and map data to those elements using

formulae, gives the user the flexibility to create sophisticated visualizations such as

Figure 4.8. This figure shows the most likely NPV against the year of sale and the

property value appreciation. The volume is actually composed by layering several

surfaces, with the certainty of NPV mapped to opacity. The color of the surface is red

if the NPV is negative, green otherwise. A wire-frame outline of the extents of the

thresholded NPV volume was added to provide context.

The information shown in Figure 4.8 could not be produced using current spread-

sheets. Firstly, flexibility of visualization was required to stack multiple surfaces with

varying color and translucency together with a wireframe outline into a single 3D

space. Secondly, the calculations that underpin the uncertainty propagation are com-

plicated enough to be prohibitive.


Figure 4.8: Volumetric Representation of the Most Likely Effect Interest Rate ChangesWill Have on NPV.

CHAPTER 5

Uncertainty Encapsulation and Automated

Propagation

5.1 IntroductionThis chapter addresses issues in modeling and propagation of uncertainty. There are

numerous information uncertainty modeling techniques, each designed for different

situations. To correctly model the uncertainty for a variable, the appropriate uncer-

tainty model needs to be chosen. This uncertainty can change over time, requiring the

data to be updated. In many instances there will be multiple variables that are subject

to different types of uncertainty. These variables may be combined through an oper-

ation and their associated uncertainty must be correctly propagated. Klir’s principle

of requisite generalization [60] requires that the uncertainty models be converted to a

sufficiently general technique capable of handling the combination. While this princi-

ple may be employed ad hoc by a trained mathematician, such a task will be beyond

the capability of many classes of users.

This chapter presents a framework and software design that allows users to easily

transition between uncertainty models and facilitates automatic propagation of uncer-

tainty. Our approach is to encapsulate the uncertainty with the variable into a unit.

This approach has three significant advantages: firstly, the uncertainty models become

polymorphic, allowing the user to think in terms of variables and not modeling tech-

niques; secondly, it provides structural support for dealing with parameters of the un-

certainty, thereby avoiding common errors that arise when related parameters are sep-

arated; and thirdly, it integrates information uncertainty modeling techniques into a

62 Chapter 5. Uncertainty Encapsulation and Automated Propagation

consistent framework, which enables automated propagation methods.

The rest of this chapter is arranged as follows. Section 5.2 examines conceptualiza-

tion, categorization, and data structures of information uncertainty for a unified frame-

work. Section 5.3 builds on this framework to provide a mechanism for automated

uncertainty propagation using look-up tables. The number of entries can become large

and a strategy for dealing with this is presented, called the hierarchical heterogeneous

propagation method. Section 5.4 offers a summary of the chapter.

5.2 Unified Information Uncertainty FrameworkThis section seeks to integrate information uncertainty modeling techniques into a co-

herent framework. We take the perspective that information uncertainty modeling is

a way of improving the fidelity of the model. By adding information about the uncer-

tainty of a variable, the user is increasing the level of knowledge about that variable.

Consider the future employment growth rates that are used in Figure 3.1, which are

shown in Table 5.1. The value used in graph (a) is ignorant of potential for variance

in the predicted growth assuming its value to be certain. In graph (b), it is known that

the value is only an estimate, which is additional knowledge that was not available in

graph (a). Graph (c) adds further information: it is certain that the value will be within

specified bounds. Graph (d) adds even more information, assigning a degree of cer-

tainty to what the actual growth value will be. With each graph, more information is

shown about the future employment rates.

Graph Growth Rate Known / Assumed

(a) 0.147, certainly it is certainly 0.147(b) 0.147, estimated it is not necessarily 0.147(c) [−0.3,0.6] it is between -0.3 and 0.6(d) μ = 0.147, σ = 0.1 it is probably 0.147

Table 5.1: Predicted Growth Rates used in Figure 3.1

The remainder of this section is split into three parts. First we discuss the con-

cepulization of information uncertainty from the abstract through to detailed models.

We then categorize modeling techniques by their level of detail. Finally, we describe

the data-structures for encoding various forms of information uncertainty.

5.2.1 Conceptualizing Information Uncertainty and its UsageUncertainty exists in many problems, arising from sources such as linguistic ambigu-

ity, the uncertainty of predicting future outcomes, and errors obtained during measure-

ment. In a philosophical sense, a variable begins with total uncertainty as an unknown

variable. This variable can be assigned a name once it comes to the attention of the

5.2 Unified Information Uncertainty Framework 63

user. At this point the value of the variable is still uncertain and has an unknown value.

For example, we can declare the variable α without declaring the value of α . Since it

has unknown value, it has a theoretically infinite potential to be any value. Thus, every

value of α is both possible and, as far as we know, equally likely.

A variable will most typically be assigned a particular value, for example α = 5.

However, this does not eliminate the uncertainty, because there are three possibilities:

firstly, five can be the actual value beyond any doubt (absolute certainty); secondly, αcan be guessed to be five (uninformed estimate); or thirdly, α can be assumed to be five

but it is unknown whether there is potential to be otherwise (uncertainty ignorance).

Knowing only the value of a variable does not explicitly differentiate between these

three possibilities. However, many data models that are used in the real world stop

at this level of detail, thus discarding valuable information. Uncertainty ignorance is

usually implied, but using a single value is common for variables that have an implicit

uncertainty. For example, the future spot price of oil can be estimated at a particular

value and it is generally understood that this value is not certain.

If it is known that the value of the variable cannot be outside a particular range, then

more is known about the variable and the amount of uncertainty is therefore reduced.

For example, the interval 4.5 ≤ α ≤ 5.5 allows for a 0.5 margin of error around five

and is often written as α = [4.5,5.5]. Figure 5.1 illustrates an example progression

of α from unknown variable to interval. Each box represents the type of information

uncertainty, while arrows represent new information about the variable α . Information

about the uncertainty is referred to as uncertainty information. As more uncertainty

information is added, the information uncertainty becomes better defined.

Unknown Value

Unknown Variable

There is a variable, α

Uncertainty Ignorance

α = 5

Uninformed Estimate

5 is an estimate

Interval

4.5 ≤ α ≤ 5.5

Figure 5.1: Progression Through States of Information Uncertainty (Boxes) as a Resultof Information (Arrows)

Variables that are subject to information uncertainty can be considered to be a cloud


of potential values in an uncertainty space. The appropriate information uncertainty

data structure is used to parametrize the uncertainty using available knowledge. At

some point, the variable is collapsed onto the real-number line to produce an estimate.

Figure 5.2 illustrates these concepts: the information uncertainty is defined by a cloud

in uncertainty space, while a collapsed value is an estimate on the set of real numbers.

Information Uncertainty

Collapsed Value(estimate)

UncertaintySpace

Real numbers

Figure 5.2: Projection of Information Uncertainty onto an Estimate Point

A data model without uncertainty details is, in fact, a data model that deals with

already collapsed values. Any difference between the value in the data model and the

actual true value is error. The user may or may not be aware of this error. In the case

of predictions, for example, it is generally understood that there can be a divergence

between the model and the actual value. However, there are other sources of uncer-

tainty where the potential for error may not be so obvious. In particular, the potential

for error accumulates as variables are combined through mathematical operations. The

alternative approach is to incorporate uncertainty details in the model, which delays the

collapse until it is necessary. Delaying the collapse as late as possible allows the poten-

tial error to be carried throughout the model. This uncertainty can then be visualized

to gain a deeper understanding of the impacts it can have.

We now briefly cover major uncertainty modeling techniques with reference to po-

tential collapses and the uncertainty parameters. Defining α as an interval determines

the possible values that α can hold, since a potential value of α is only possible if it is

within the interval. Thus, if α is defined by an interval bounded inclusively by l and u

such that l ≤ u and l,u∈R, then any potential value of α must be an element of the set

[l,u]. Another way of expressing set membership is through a membership function,

μ(x) =

{1, l ≤ x≤ u

0, otherwise

where a value of 1 indicates inclusion and 0 indicates exclusion. The uncertainty arises

because all values x are valid collapses of the variable α so long as μ(x) = 0. Figure 5.3

graphs the membership function and all collapses for the interval [4.5,5.5].


4 5 6

Membershipμ(x)

Real numbers

1

0

Figure 5.3: All Collapses of the Interval [4.5,5.5]

Finer grades of uncertainty specification are allowed when the range of μ is not

constrained to either 0 or 1. For example, rough sets allow three possible states: totally

an element, partially an element, and not an element; while fuzzy sets allow the range

of μ to vary over the interval [0,1]. So long as μ(x) > 0, a collapse to x is possible, as

illustrated by the graph of a fuzzy variable in Figure 5.4.

4 5 6

Membershipμ(x)

Real numbers

1

0

Figure 5.4: All Collapses of a Fuzzy Number Around 5

A graded membership allows the user to specify the degree to which potential

values should be considered a valid collapse of the variable. The advantage of doing so

is that a cut plane can be used to control the amount of compliance desired. A cut plane

is a minimum value that is required of μ in order to be considered for collapse. For

example, Figure 5.5 shows a reduced collapse set for the same variable as in Figure 5.4

that result from using a cut plane.

4 5 6

Membershipμ(x)

Real numbers

1

0

cut plane

Figure 5.5: Collapses for a Fuzzy Number Around 5, Using a Cut Plane

Probability based systems have a similar construct, called the Probability Density


Function (PDF). There is an additional requirement that the integral of a PDF should

be 1. Infinitesimally small likelihoods are approximately zero for practical purposes.

The various types of sets and the probability density functions can also be discrete,

in which case the membership or probability is only non-zero for a finite number of

collapses. An example is the probability density function for the sum of a throw of two

dice: the probability is only non-zero for integers from 2 to 12. All other outcomes are

impossible, including fractional outcomes.

This section described information uncertainty in terms of an uncertainty space

and its real number collapses. Several types of uncertainty modeling techniques were

reviewed with reference to this conceptualization. We now categorize these into a

coherent framework.

5.2.2 Categorization of Uncertainty ModelsThere are several modeling techniques that can be used to describe information uncer-

tainty with varying degrees of knowledge. We place these techniques into one of five

general categories: estimate, where the value is not guaranteed to be the true value;

non-specificity, where the true value is known to be one of a set of values; probabil-

ity, where the likelihood of the true value is known; membership, where the degree of

membership1 within a group or label is known; and belief, where the believability of

values is known. The categories are based on the type of knowledge that is encoded,

which is necessary for determining suitable display techniques. Table 5.2 presents

these categories together with commonly used modeling techniques.

5.2.2.1 Known Value

The known value types indicate that the value is known to be correct. The fact that this

value is known to be correct is information about the uncertainty, namely that there is

no uncertainty. While traditional techniques (scalar values, etc) are often assumed to

be known values, we reserve this category for value types that specifically hold values

that are known to be exact. We consider traditional techniques to represent uncertainty

ignorance, since it is not explicitly stated whether the value is true or not.

5.2.2.2 Estimation

In the context of this categorization system the term estimation refers to a reasonable

guess at the value of a variable. The estimation category describes modeling techniques

where the data is known to be uncertain, but no parameters of the uncertainty are

encoded. For example, the predicted gross domestic product for a country can be given

1Standard sets do not employ a notion of partial membership and are thus part of the non-specificitycategory.


Category What is known Example ModelingTechniques

Known Value The value is accurate traditional scalarEstimation The value is not accurate traditional scalars,

vectorsNon-specificity The value is one of known

alternativesintervals, sets

Probability The likelihood of eachpotential value

probabilitydistributions

PartialMembership

The degree of belonging toeach set for each potential

value

rough sets, fuzzy sets

Belief Availability of evidence Dempster-Shaefercalculus, Bayesian

probabilities

Table 5.2: Categories of Information Uncertainty Modeling Techniques

as a number. It is understood that this might not be the exact future gross domestic

product, but the variable does not provide any further details about this uncertainty.

For this reason, traditional non-uncertainty modeling technique encodings, such

as floating point numbers, can act as estimations. Estimations provide a heuristic ap-

proach to uncertainty, allowing each user to interpret its meaning according to their

own understanding.

5.2.2.3 Non-specificity

Non-specificity refers to methods that describe a realm of possible matches, but are

non-specific about which match it is. An example of a problem suitable to non-

specificity is errors in measurement, where the real value is considered to be one of

a possible range.

The basic modeling technique underlying non-specificity is the mathematical set.

Such a set can be a discrete set, such as the set of possible conditions as diagnosed by

a doctor; or a continuous set, such as the interval covering the measurement error.

Non-specificity states that one of the alternatives is assumed to be the case, but

does not express a preference for which one, nor does it allow for partial matches. For

visualization this poses a challenge to avoid misinterpretation as to the likelihood of

alternative possibilities. However, non-specificity models can be considered to imply

the degree of uncertainty by the number of alternatives offered. As the number of

possible alternatives reduces, the uncertainty of each member also reduces until the

limit of no uncertainty, which occurs where there is only one possibility.


5.2.2.4 Probability

The class of modeling techniques that detail the level of likelihood for positive matches

use probability. Probabilities are derived by determining the proportion of inputs that

are likely to produce positive matches, given random samples. An example of a prob-

lem suitable to probability is a variable whose prior values are known, from which

a picture of likely future values is built. The universe of discourse is described as a

joint probability, which provides an expectation of every possible outcome. An ex-

ample from the medical domain is the probability of a particular disease, given the

socio-demographic profile of the patient. This probability was derived from prior ob-

servations of the population.

Probability differs from non-specificity because it quantifies the expectation for

each match, but is similar to non-specificity because it assumes only a single positive

outcome. Although the probability of an outcome can appear insignificantly low, it is

still possible. Depending on the user objective during visualization, attention may be

required to ensure that low probabilities are not masked by more likely options.

5.2.2.5 Partial Membership

The partial membership category of information uncertainty modeling techniques en-

code a degree of membership, allowing for partial membership in multiple sets. Com-

monly, degree of membership modeling techniques are used in situations where the

source of the uncertainty arises from vagueness and they are particularly suited to nat-

ural language problems. An example problem for which the degrees of membership

are suitable is where human interpretations need to be encoded as variables. Another

use for degree of membership is simplification of complicated systems: Systems of

many rules and exceptions can often be replaced with simpler, more elastic, rules that

observe partial membership conditions.

Information modeling techniques that express degrees of membership include rough

sets and fuzzy sets, both of which have accompanying novel definitions for logical op-

erations [89, 79, 77]. An example in the medical scenario would be a situation where

the medical record states that “there were strong signs of epidermal scarring”. In this

example the degree of membership in the “epidermal scarring” set would be strong,

which may translate into a 0.9 degree of membership. The degree of membership in

the set of “no epidermal scarring” would be weak, or 0.1.

Unlike non-specificity and probability, which describe a single positive match, de-

gree of membership approaches allow the same variable to be treated as a positive

match in several sets simultaneously. The greater the degree of membership within a

set, the more certainly the variable can be treated as a member of that set. Unlike non-

specificity, where the uncertainty is implied by the number of alternatives, the amount


of uncertainty is directly encoded by the degree of membership.

5.2.2.6 Belief

The final category of information uncertainty modeling techniques deals with belief.

Degrees of belief are used in evidence-based reasoning systems, such as expert sys-

tems. An example of a problem suitable to degrees of belief is a situation where the

veracity of the outcome must be supported by evidence.

Techniques for operating on degrees of belief include Bayesian inference, which

uses probabilities to model belief. Other technqieus include Dempster-Shaefer calcu-

lus (see [107]). Belief is similar to probability, because the uncertainty of a positive

match is encoded. Degrees of belief are unlike probabilities, because they are not

based on knowledge of the universe of discourse, but give an indication of certainty

that a positive match exists based on available evidence. Sometimes this evidence is

subjective.

An example in the medical domain is the degree of belief that a recently discovered

disease is contagious. Given the contagiousness of all known diseases, the probability

of a new disease being contagious might be 0.25; however, without any evidence to

suggest that the new disease is contagious, the degree of belief that the new disease is

contagious is zero.

For visualization purposes, degrees of belief can be treated similarly to probabil-

ity. However, users may more often be concerned by exceptional or poorly supported

matches.

5.2.3 Data Structures for Information UncertaintyThis section outlines data structures for recording uncertainty information about vari-

ables. These data structures hold parameters that describe the uncertainty space, from

which the possible collapses can be derived. Common information uncertainty model-

ing techniques are listed in Table 5.3. Items in italics indicate abstract types.

The data structure descriptions given here assume that instanciated data types can

be distinguished at runtime. If the environment does not support this feature, then the

data structures would need to be extended to include a type identifier.

The Quantity type holds a single real number and indicates that a variable that has

uncertainty ignorance. All data structures listed here inherit from the Quantity type,

since they can be considered to be refinements on uncertainty ignorance. The absolute

certainty and unsubstantiated estimate types are derivatives of Quantity that do not add

further parameters, so long as the environment can distinguish between them.

The discrete classical set type stores a set of potential values. A single value from

the set is chosen to be the representative value. It is invalid for the representative value


Type Description

Quantity (Uncertainty Ignorance) Ignore the potential existence ofuncertainty

Absolute Certainty No uncertaintyUnsubstantiated Estimate An estimate or “guess”Confidence Singular valued graded possibilityDiscrete Classical Set Set of possibilitiesInterval Convex continuous setSet of Intervals Arbitrary continuous set (often

non-convex)Discrete Rough Set Discrete, 3 gradesContinuous Rough Set Set with three grades of possibilityDiscrete Fuzzy Set Discrete set with infinite grades of

possibilityContinuous Fuzzy Set Continuous set with infinite grades of

possibilityLinearly Defined Fuzzy Set Fuzzy set defined by a sequence of

line segmentsDiscrete Probability Distribution Discrete probability distribution

(PMF)Continuous Probability Distribution Continuous probability distributionLinearly Defined ProbabilityDistribution

Continuous distribution defined by asequence of line segments

Uniform Continuous ProbabilityDistribution

Uniform distribution over a boundedinterval

Gaussian Probability Distribution Normal distribution defined by a meanand variance

Table 5.3: Common Information Uncertainty Modeling Types


to not be a member of the set (see the principle of representation in Section 4.3.1). The

set is discrete and itemizes every possible collapse.

The interval type stores a continuous set of values that are bounded by upper and

lower value. Each boundary can either be inclusive, which means that the boundary

value is an element of the set, or exclusive, in which case the boundary is not included.

The representative value defaults to the mid point between the boundaries. The interval

data structure is illustrated in Figure 5.6.

Real numbers

lower bound upper bound

representative value

representative value: reallower bound → { value: real, inclusive: boolean }upper bound → { value: real, inclusive: boolean }

Figure 5.6: The Interval Data Structure

The set of intervals type is used to describe arbitrary continuous sets. This data

structure is required to allow a continuous set that is not convex, since intervals are

necessarily convex. The representative value can be chosen by the user, so long as it

adheres to the principle of representation.

The continuous rough set type is used to describe an arbitrary rough set. Rough

sets allow three grades of possibility: elements are completely within the set, partially

within the set, or not within the set, which we map to membership values of 1, 0.5,

and 0, respectively. The continuous rough set data structure consists of a sequence

of membership change markers, which indicate the points at which the membership

function changes. The markers consist of two pieces of information: firstly, the point

at which the membership function changes; and secondly, what the new value of the

membership function from that point until the next marker. The final marker indicates

values from that point until positive infinity, while the membership for values from

negative infinity to the lowest marker is specified in a separate field called priorμ .

Figure 5.7 illustrates the rough set data structure and its membership function. The

priorμ is zero, thus the membership function is zero between −∞ and the first point.

The first change in the function is at 2, as defined by the first marker in the sequence.

Each marker signals a change in membership and the last marker sets the membership

value from there to ∞. Although the representative value can be any value that has a

non-zero membership, it is usual for the representative value to have unit membership.

The discrete rough set type is similar to the possibility set, except that each element

additionally specifies a degree of membership, {value,membership}. The degree of

membership must be 1, 0.5, or 0.


Membershipμ(x)

1

0

2 5 11 12

priorμ: 0markers: { {2,0.5}, {5,1}, {11,0.5}, {12,0} }

Figure 5.7: Definition of Continuous Rough Set Using a Marker Sequence

The linearly defined fuzzy set type models a fuzzy set whose membership function

is defined by a sequence of connected line segments. Similarly to the continuous rough

set, a sequence of markers is used. However, the markers now indicate the vertices

between line segments. The value of the first marker is assumed to extend to −∞ and

the value of the last marker extends to ∞. The membership function for values between

markers is a linear interpolation of the two closest markers. Figure 5.8 illustrates the

membership function and data for a fuzzy set. It is usual for the representative value to

have maximum membership.

Membershipμ(x)

1

0

2 5 11 12

markers: { {2,0}, {5,1}, {11,1}, {12,0} }

Figure 5.8: Definition of Linearly Defined Fuzzy Set Using a Marker Sequence

The discrete fuzzy set type is similar to the discrete rough set, except that the de-

gree of membership can be any value from 0 to 1. Again, the representative value

usually has the highest membership value. Similarly, the discrete probability distri-

bution stores a finite set of {value, probability} combinations. While the range of the

probability components is [0,1], their sum must equal 1. The linearly defined prob-

ability distribution is the probability version of the linearly defined fuzzy set and is

implemented using the same mechanism. This data structure provides the freedom to

approximate almost any distribution by sampling it. However, this approximation may

not be sufficiently accurate for certain applications.

Two common probability distributions are the uniform continuous probability dis-

tribution and the Gaussian probability distribution (also referred to as the normal dis-

tribution). The uniform distribution is defined by the bounds of an interval and all

5.3 Automated Propagation of Information Uncertainty 73

values within this range are equally likely. The evaluation of the PDF for a uni-

form distribution is f (x;a,b) =

⎧⎨⎩

1b−a f ora≤ x≤ b

0 otherwise, where a and b define the in-

terval. The Gaussian distribution also only requires two parameters: the mean value,

μ , and the standard deviation, σ . The evaluation of the PDF for x is f (x;μ,σ) =1

σ√

2π exp(− (x−μ)2

2σ2

).

There exist several well-known probability distributions that are evaluated analyt-

ically. This include the beta distribution, which is supported on the bounded interval

[0,1], and for which the PDF is evaluated as f (x;α,β ) = xα−1(1−x)β−1∫ 10 μα−1(1−x)β−1du

. Other

examples are the Laplace distribution, the exponential distribution, and the Cauchy-

Lorentz distribution. Interested readers are directed to probability texts such as [111].

5.3 Automated Propagation of Information UncertaintyUncertainty needs to be propagated to the result whenever an uncertain variable is in-

volved in an operation. Thus, to support arithmetic for information uncertainty requires

that methods be defined to propagate the uncertainty correctly. Current analytical un-

certainty methods require propagation to be carried out manually. This is cumbersome

and makes it easy to introduce avoidable errors.

This section is devoted to automatic propagation of uncertainty. Two aspects are

covered. Firstly, a mechanism for supporting automated propagation is presented,

called the uncertainty propagation model. This mechanism is configurable and exten-

sible, but it requires rules to be defined for different combinations of variable types and

these can become numerous. The second part presents the hierarchical heterogeneous

propagation, which provides a means for resolving a default course of action. This

reduces the complexity of the propagation model as fewer rules need to be defined.

5.3.1 Uncertainty Propagation ModelIn a typical spreadsheet system there are several types of operators ranging from ba-

sic arithmetic through to pseudo-random number generators. For the purpose of this

section, an operation refers to indivisible units of work. It has a single output and op-

tionally multiple inputs. In order to provide automated uncertainty propagation, most

of these operators will require extensions to deal with information uncertainty. Partic-

ularly, operators that take at least one variable as their input will likely propagate the

uncertainty from the variable (or variables) to their output.

New information uncertainty modeling techniques might be added to the system.

For example, someone may wish to implement a beta distribution type. It is therefore


necessary to have a mechanism to add appropriate propagation methods as well. Fur-

thermore, some users and domains may have task-specific propagation requirements.

This means that the user will wish to swap out certain propagation methods for alterna-

tive implementations. For example, in a risk assessment application the user may wish

to employ a principle of maximal uncertainty, whereas a remote sensing user may find

a principle of minimal uncertainty to be more appropriate.

There are three requirements for the automated propagation system. Firstly, it must

be extensible, allowing new propagation methods may be added in future. Secondly,

it must be able to resolve which method to employ in the face of multiple alternatives.

Thirdly, it should be configurable by the user: The user will want to choose which

methods they want to use, and they may change their mind.

Most commonly used operators in a spreadsheet system will be either unary (taking

one input) or binary (taking two inputs). The propagation model described here is gen-

eralized to handle any n-ary operator. This includes n = 0, which takes no parameters.

We define the Uncertainty Propagation Model (UPM), which uniquely maps an

operator-operand signature (sig) to a Uncertainty Propagation Method (method). The

sig consists of the operator identifier (e.g. “+” for addition) and the types of operands.

The propagation method is a handler that takes the operands and returns a result. Thus,

UPM : sig→ method

where sig = {String, Type, Type, ...} and method : {C1, C2, ...} → C, where C is an

object encapsulating an information unit and its associated uncertainty details. For ex-

ample, consider the following equation: x = 8+∼ 5, where∼ 5 means “approximately

five”. The signature becomes {”+ ”, Quantity, Estimate}, indicating the addition of a

quantity to an estimate type. The propagation model maps this signature to a method

that can handle addition between quantity and estimate types. The parameters to the

method are {8,∼ 5} and it returns ∼ 13, which means “approximately thirteen”.

When a sig cannot be found in the propagation model, the operation is illegal and

the result is an error. Typically, an error condition is handled by returning a special

error method, which takes any arguments and returns an error as the result. A sig

may also map to nil, explicitly indicating that no conversion is permitted and attempts

to do so will result in an error. This enables users to deliberately disallow particular

combinations.

5.3.2 Hierarchical Heterogeneous PropagationThe number of entries in an uncertainty propagation model will grow exponentially

as new data types are added to the system. Adding methods can therefore become


a significant task. Hierarchical heterogeneous propagation helps to overcome this is-

sue by searching for a suitable combination using existing methods. Parameters are

implicitly converted into compatible types. This requires that the uncertainty model-

ing techniques be integrated into a coherent framework such that the uncertainty can

be captured using interchangeable models. This is similar to the aims of generalized

information theory (see [60]).

The principle behind hierarchical heterogeneous propagation is to arrange uncer-

tainty modeling techniques in a hierarchical order of increasing uncertainty detail. We

group these techniques into three strata: the top tier being the crisp strata; the middle

tier being the bounded strata; and the lower tier being the explicit strata. Categories

of the information uncertainty modeling techniques and their stratification is shown

in Figure 5.9. The crisp strata includes the singular valued types, which are known

value, uncertainty ignorance, and estimate. These types specify a singular value and

optionally describe the presence or otherwise of uncertainty, but do not specify further

details about any potential uncertainty. The bounded strata includes classical set based

uncertainty types, which are classical sets, intervals, and sets of intervals. These types

specify the boundary between the possible collapse values and those that are not con-

sidered possible. The explicit strata includes types that explicitly specify the degree of

uncertainty for potential values, such as rough sets, fuzzy sets, and probabilities.

crisp

bounded

explicit

estimate

non-specificity

probability membership belief

increasinguncertainty

detail

uncertainty ignorance

known value

Figure 5.9: The information uncertainty modeling techniques sorted into three strata

Estimates belong in the crisp strata, because estimates represent values that are

treated as though they were the true value. No information about the uncertainty is

known about an estimate, beyond its existence. In the bounded strata, the modeling

techniques encode the boundaries between what is possible and what is not. For exam-

ple, the accuracy of a measurement device can be specified as ±10 units. This means

that it is certain that the measurement is within 10 units (assuming the device is work-

ing correctly), however, nothing is stated about the uncertainty of the possible values.

Put another way, it is not stated whether +1 is more, less, or equally certain than +2.


The explicit strata contains modeling techniques that explicitly state the degree of un-

certainty of all candidate values. For example, with reference to the probability in

Table 5.1 (d), it is more uncertain that the predicted employment growth rate in Cali-

fornia will be -0.3 than 0.147.

Each strata provides increasing detail about the uncertainty over the previous one.

As users refine their model, they can progress downward along the strata with the ad-

dition of more information. Thus, a variable may begin as an estimate, then be refined

to an interval once the extents are known, then be refined further into a probability

distribution as the full likelihood becomes known. This refinement is implied by the

arrows in Figure 5.9. The reverse operation is also possible where, by removing or

simplifying information, the variable can be modeled using a technique that is further

k up the level of detail tree. Figure 5.10 shows an example of how a variable might

proceed to be refined. The left hand path shows an example of a convex variable, while

the right hand path shows a non-convex example. Convexity is defined by continuity

of the collapses.

Single-valued VariableInformation: Assumed TotalUncertainty: Assumed None

Convex Set (Interval)Information: Bounded RangeUncertainty: Non-specificity

Classic Set (Non-convex)Information: Bounded RangeUncertainty: Non-specificity

Graded Convex PossibilityInformation: Graded PossibilityUncertainty: Graded Non-specificity

Graded Arbitrary PossibilityInformation: Graded PossibilityUncertainty: Graded Non-specificity

(rough set is a specific case)

Convex Probability DistributionInformation: Probability DistributionUncertainty: Strife of Outcome

Arbitrary ProbabilityInformation: Probability DistributionUncertainty: Strife of Outcome

Imprecise Probability (Convex)Information: Probability, IntervalsUncertainty: Probability & Outcome

Arbitrary Imprecise ProbabilityInformation: Probability, IntervalsUncertainty: Probability & Outcome

Figure 5.10: Example of Increasing Levels of Uncertainty Information

Adding uncertainty information to a variable is called promotion and removing de-

tail is called demotion. A change in the uncertainty modeling technique of a variable

that does not change its level of detail is a conversion. The hierarchical heterogeneous


propagation mechanism uses a combination of promotions, demotions, and conver-

sions until it can find a suitable signature in the propagation method.

Promotion, demotion, and conversion is performed using casting operators. These

operators take a single operand and return an approximately equivalent representation

using different modeling type. These operators are part of the propagation model, such

that P ⊆ PM, D ⊆ PM, N ⊆ PM, and P∩D = P∩N = D∩N = /0, where P is the set

of promoters, D is the set of demoters, and N is the set of converters.

A mapping to nil in the propagation model indicates a combination that the user

wishes to forbid. In this case the hierarchical heterogeneous propagation search not to

be invoked and the operation will still be invalid. Once the hierarchical heterogeneous

propagation search has been invoked, it will ignore all nil entries.

Two directed graphs are built: the promotion graph from P∪N and the demotion

graph from D∪N. The vertices are the types of uncertainty modeling techniques,

and the edges are the casting operators. These may be disconnected graphs, should

the user or propagation model wish to exclude certain combinations. The hierarchical

heterogeneous propagation algorithm is a path search to find a suitable combination of

parameter types for which a (non-nil) mapping in the propagation model can be found.

Should such a combination not be found, then the operation results in an error.

The graphs are walked for each parameter, minimizing a cost function. We choose

the cost function to be the number of edges in the path. Alternative cost functions

require weightings or other annotations to be added to the graphs and this is beyond

the scope of this project. We use a parameter, called the favored direction, which

dictates whether the promotion or demotion graph is walked first. It is also possible

to use the complete casting graph (i.e. the graph of P∪D∪N), however this does not

provide control over the favored direction.

To illustrate, consider a system that supports three data types: quantities, estimates,

and intervals. In this system the following promotion operators have been defined:

quantity→estimate and estimate→interval. The following demotion operators have

been defined: interval→estimate and estimate→quantity. The promotion graph will

be quantity→estimate→interval and the demotion graph will be interval → estimate

→ quantity (see Figure 5.11). For the purpose of this example, the propagation model

only defines arithmetic between parameters of the same data type and the hierarchical

heterogeneous propagation method is promotion favoring. To add a quantity to an

interval will therefore find a solution that converts the quantity parameter to an interval,

using two edge hops. Thus, the system will promote the quantity to an estimate, then

to an interval, then perform the interval arithmetic.

To illustrate the previous example using numbers, consider x = 5 and y = [2,3].Then,


Q u a n t i t y

E s t i m a t e

In te r va l

p r o m o t i o nd i r e c t i o n

d e m o t i o nd i r e c t i o n

Figure 5.11: Sample Promotion/Demotion Graph

x+ y = 5+[2,3] , incompatible

→∼ 5+[2,3] , incompatible

→ [5,5]+ [2,3] , success

Suppose that the promotion from estimate to interval was not defined, i.e. the pro-

motion graph consists only of quantity→estimate. In this case the promotion graph will

not yield a result and the hierarchical heterogeneous propagation search will therefore

fall back on the demotion graph. The solution of demoting the interval parameter to a

quantity is found and standard arithmetic is employed.

The hierarchical heterogeneous propagation mechanism makes propagation mod-

els simpler and easier to manage. It will produce the best results for propagation mod-

els that have well defined casting operators. However, there is less control over the

accuracy of the results as it depends on the casting operators to maintain the semantic

meaning of the uncertainty information, which may not always be reliable.

Depending on the complexity of the uncertainty modeling hierarchy and the fre-

quency of calls, the searches may take up noticeable computing resources. In practice

this can be alleviated by either caching results, or by using hierarchical heterogeneous

propagation offline to generate a more fully defined propagation model. This offline

process invovles inserting propagation methods that incorporate the conversion pro-

cess.

5.4 SummaryThis chapter described the integration of information uncertainty modeling techniques

into a unified information uncertainty framework and methods for automating the prop-

agation of uncertainty information.

The encapsulation approach to information uncertainty intrinsically associates in-

formation and its uncertainty. This ensures that uncertainty parameters do not become

separated; either from each other or from the related unit of information. The sys-

tem can now manage the uncertainty information, since it is intrinsically aware of it.

The unified information uncertainty framework, which details the conceptual relation-

ships between different modeling types and facilitates the formulation of strategies for

conversion between types.

5.4 Summary 79

An uncertainty propagation model is a means for automating uncertainty propaga-

tion: it catalogs methods for handling different combinations of uncertainty modeling

types under different operations. Where the propagation model does not explicitly

define an operation for a particular combination of uncertainty modeling techniques,

the hierarchical heterogeneous propagation mechanism can be used. Hierarchical het-

erogeneous propagation uses a hierarchical structure to implicitly provide methods for

propagation. This has the effect of simplifying the propagation model and increasing

the robustness of the propagation mechanism.

CHAPTER 6

Uncertainty Abstraction for Visualization

6.1 Motivation and ObjectivesInformation uncertainty visualization is made difficult because of a diversity in mod-

eling techniques. Visualization techniques tend to be created for particular types,

which makes it challenging to subsequently change the uncertainty information. It

would therefore be beneficial to have a coherent visualization framework that is ca-

pable of simplifying the visualization technique selection process while avoiding data

type lock-in. This chapter presents two components that make up such a framework.

Firstly, a user-objectives approach to information uncertainty visualization aids users

in selecting visualization techniques according to their visualization aims. The sec-

ond contribution consists of uncertainty abstraction models, which enable visualization

techniques to built in a data type independent manner.

It is commonly recognized that formal approaches to visualization are required to

avoid the disadvantages of ad hoc solutions. Visualization has traditionally followed

a data-driven approach (e.g. [15, 16, 115]), where the nature and structure of the data

set forms the basis of visualization choices. This approach provides a coherent and

structured framework, but it does not take into account the needs of tasks or users.

More recent work has taken a task-driven approach, which address the specific re-

quirements of application domains by guiding visualization based on the task at hand

(e.g. [11, 57, 82]). This approach better reflects the needs of the user, but is not suffi-

ciently generic: The techniques may work very well in one domain, but might not be

transferable to other domains unmodified.

82 Chapter 6. Uncertainty Abstraction for Visualization

We take a user-objectives approach, which aims to reduce the limitations of fo-

cusing on specific tasks, while simultaneously providing a coherent framework. An

objective represents a goal that a user aims to achieve. Objectives are at a more ab-

stract and generic level than tasks, making it possible to create a coherent visualisation

framework that is generic while still focusing on user needs.

There is a lack of consistency between current information uncertainty visualiza-

tion techniques, since they are typically developed for particular data structures. Un-

certainty abstraction models provide a consistent interface between the visualization

technique and the uncertainty modeling technique. These can be used to overcome

this issue and therefore offer a broader base of visualization techniques from which to

choose. These Three uncertainty abstraction models are described here, the Unified

Uncertainty Model, the Dual Uncertainty Model, and the Quad Uncertainty Model.

Each provides differing levels of discrimination between alternative views of uncer-

tainty. These models can also be composed recursively.

This chapter is organized as follows. Section 6.2 describes the user-objectives for

information uncertainty visualization. A computer-aided selection algorithm is pre-

sented to show how user-objectives can be incorporated into a visualization system.

Section 6.3 details the uncertainty abstraction models, including theory, notation, de-

sign, and use. Two case studies follow. The first, in Section 6.4, examines how user-

objectives can be used to visualize uncertainty for financial decision support. The

second examines simplification of business process specifications using the relevancy

objective and is presented in Section 6.5.

6.2 User-objectives for Information Uncertainty Visu-

alizationOur work has identified five visualization user-objectives that relate to information un-

certainty. These are possibility, reliability, relevancy, uncertainty structure discovery,

and ignore uncertainty. Each objective relates to a specific type of insight that the user

is seeking. Table 6.1 illustrates the essential differences between the visualization ob-

jectives by listing an English language query that characterizes questions suitable for

each objective along with an example of where the objective would be used.

Visualization techniques transform information into visual elements. Figure 6.1

provides a visual illustration of the conceptual differences between each of the ob-

jectives. The graphs indicate the relationship between the visibility and the level of

uncertainty. Visibility refers to the degree to which the viewer’s attention is drawn to a

feature.

• The possibility objective makes no distinction between degrees of uncertainty,

6.2 User-objectives for Information Uncertainty Visualization 83

Objective Characteristic Query Example of Use

Possibility “What is possible?”or“To what extent could I bewrong?”

Choosing from or comparinga number of alternativeoptions

Reliability “What is most likely?”or“What is most certain?”

Decision support, with degreeof confidence

UncertaintyStructureDiscovery

“How certain or uncertain isthis?”or“What is the structure of theuncertainty?”

Evaluation of theuncertainties involved in aparticular scenario

Relevance “What is relevant?” Simplification of acomplicated system orintelligent linked views

“Ignore Un-certainty”

“What if there were nouncertainty?”

Situations where theuncertainty is small enough tobe tolerable.

Table 6.1: Information Uncertainty Visualization Objectives

provided it is non-zero. In other words, as long as an event is possible, it is

shown to the user.

• The reliability objective gives higher visibility to information that is more cer-

tain.

• The structure discovery objective brings different levels of uncertainty to the

attention of the user, to provide insight into the structure of the uncertainty.

• The relevancy objective shows only information with a high certainty of rele-

vance to the user’s task.

6.2.1 Analysis of User-Objectives6.2.1.1 Possibility Objective

The possibility objective is motivated by the characteristics of the non-specificity class

of uncertainty modeling methods and answers the question “what are competing pos-

sibilities?”. The purpose for the user is to gain an overview perspective and be aware

of the various possibilities. Often the user is concerned by the range of alternatives to

a particular interpretation of information.

The possibilities may be either discrete or continuous. Discrete possibilities rep-

resent a finite number of distinct alternatives. An example of a discrete possibility


Possibility Objective Reliability Objective

Structure Discovery Objective Relevancy Objective

Figure 6.1: Graphs illustrating the Visual Treatment of Information with Variable De-grees of Uncertainty under Different Objectives.

visualization is a graph of several potential profit predictions within the same graph.

Continuous possibilities are commonly used to model error, where the true value is one

of a continuous range of possible values. An example of visualization using continuous

possibilities is a 3D volume model in medical imaging applications where the extent

of the potential error introduced during the scanning process is made explicit by using

translucent regions (this example can be seen in [51, pp.9]). The possibility objective

is the same for all information uncertainty modeling techniques: provided the degree

of certainty is not zero, the information is considered a possibility.

6.2.1.2 Reliability Objective

Reliability-based visualization expresses the degree of confidence in information. In

this case, the user wishes to understand the extent to which they can rely on the infor-

mation. This visualization objective mirrors the minimal uncertainty principle, where

the user is most interested in information with low uncertainty.

Unlike the possibility objective, where no explicit attention is given to the degree

of the uncertainty, the reliability objective requires this information to be available.

As such, variables using a non-specificity class of information uncertainty modeling

technique are not strictly suitable for reliability objectives.

An example of reliability visualization is a graph of projected profits, where the

projections are more translucent for regions that are less likely. More than one source

of uncertainty can be visualized concurrently by mapping them to different visual fea-

tures.


6.2.1.3 Uncertainty Structure Discovery Objective

The structure discovery objective seeks exposition of the uncertainty itself. The pur-

pose is to draw attention to the uncertainty within the information to better understand

its structure.

Similar to the reliability objective, the degree of uncertainty must be present within

the information uncertainty modeling technique. The degree of uncertainty is mapped

to visual features in order to distinguish information based on its uncertainty. Unlike

the reliability objective, the user is not mainly concerned with the most certain infor-

mation. To illustrate, the purpose of the visualization may be to highlight regions of

high uncertainty. One example that draws attention to high uncertainty is a volume

visualization that uses the degree of uncertainty to determine opacity, with regions of

higher uncertainty being more opaque [25].

An example of a situation with a structure discovery objective is where the user

wishes to understand the uncertainty associated with the predicted outcomes should

they proceed with in a particular decision. The visualization could be a graph of pro-

jected sales with the degree of uncertainty being color-coded. The user may then look

at the graph and ask the question “how confident are we about making x sales?” and

the answer will be given by the color of the predicted sales at x.

6.2.1.4 Relevancy Objective

Relevancy visualization displays information that is relevant to the task of the user.

These techniques necessarily contain introduced error, since the determination of the

degree to which information is relevant to the user is itself uncertain. Additionally, the

primary use for relevancy visualization is to reduce the amount of information in the

visualization. Reduction adds a degree of uncertainty due to its non-reversible nature,

where multiple variations of original data sets can result in the same reduced set.

Since visualizations for the relevancy objective order features according to their

relevance to the user’s task, the degree of relevancy must be determined for each fea-

ture. To achieve this, a criterion function is defined that maps features to their degree

of relevancy [110]. Different user tasks will have different criteria for what constitutes

relevant information and constructing a suitable criterion function will typically be task

specific.

Although uncertainty is introduced during the visualization process, this does not

preclude investigation of uncertainty contained in the data model. Where there is un-

certainty in the data model, the criterion function must consider this. An example

where the criterion function depends on the uncertainty in the data model is where the

user is seeking to identify areas for improving measurement accuracy, giving variables

with a high amount of potential error greater relevance.


The objective of the user is to see only the information that is of sufficient relevance

to a task. Less relevant information is either aggregated until it becomes relevant or

discarded entirely.

6.2.1.5 Ignore Uncertainty Objective

Sometimes the user will want to ignore uncertainty for the purposes of visualization.

This is effectively the degenerate case of uncertainty visualization, where standard

techniques are used and the uncertainty information is excluded.

6.2.2 A Computer Assisted User-Objectives Selection MethodThe user-objectives can be applied manually, as was done in the case study of Sec-

tion 6.4. However, there exists repetition and commonality that can be exploited to

create an automated elicitation process. Automated user-objectives elicitation can be

used to augment the visual mapping process. For example, a computer guided system

that presents the user with a list of available variables can be extended to include an

interactive objectives selection tool. This tool asks the user a series of questions and

then offers suggested objectives. The interactive tool is outlined in Algorithm 1, where

Q1 through Q5 refer to the questions in Table 6.2.

Questions in the decision tree:

1. “Is the query about all possibilities (including highly uncertain information)?”

2. “Is the query about what is most “likely” or “sure”?”

(a) “Is it more important to show “how sure” or “only the most sure”?”

3. “Is the query about the level of uncertainty in the variable itself?”

4. “Is the query a grouping/filtering/processing of information according to subjec-tive or otherwise uncertain measure?”

5. “Is it necessary to reduce the clutter (simplify the visualization)?”

Table 6.2: Questions Used to Elicit the User-Objective

Once the objectives have been specified, the visual mapping software can use this

information to help guide visualization choices. For example, if the user has a pos-

sibility objective for a particular variable, then the visualization system will propose

adding geometry to expose all possible values for that variable. On the other hand,

if it were a structure discovery objective, then the visualization system will propose a

pseudo-color mapping to highlight the different degrees of certainty.


Algorithm 1 User-objective Selection Method Algorithmpresent user with list of variablesuser chooses the variables they wish to include in visualizationfor each variable do

o← {}if Q1 is true then

o← o ∪ { possibility }end ifif Q2 is true then

if Q2a is first option theno← o ∪ { reliability }

else if Q2a is second option theno← o ∪ { ignore }

elseo← o ∪ { reliability, ignore }

end ifend ifif Q3 is true then

o← o ∪ { structure }end ifif Q4 is true then

o← o ∪ { relevancy }end ifif Q5 is true then

o← o \ { possibility, structure }end ifif o = then {}

o← { ignore }end if

end forGroup variables with similar objectivesPresent groupings to user in a list.if only one objective then

automatically select itelse

allow the user to specify which ones (e.g. multi-select)end if


The objectives-selection selection method can be used with adaptive visualization

systems. For example, the visualization task network [11] can be extended to account

for objectives by learning different weights for each objective. The objectives form the

top level of the new visualization task network, following which are domain-specific

tasks, followed by visualization techniques, and then the visual feature mappings.

In summary, the user-objectives approach to information uncertainty visualization

offers a mechanism for aiding the user to select visualization choices. Objectives rep-

resent the type of insight that the user is aiming to gain from their visualizations. The

advantage of this approach is that the visualization requirements are not driven by the

data type, reducing data type dependency. Furthermore, the objectives approach avoids

issues of domain specificity that typically result from task-driven approaches.

6.3 Uncertainty Abstraction ModelsThe user-objectives approach aids users designing their visualization, however, it does

not solve the data type depnedency issue for implementors of visualization techniques.

Visualization techniques require a mechanism that enables them to work across differ-

ent uncertainty modeling techniques. This capability can be provided through the use

of uncertainty abstraction models, which specify an interface between visualization

techniques and information uncertainty modeling techniques.

This section investigates different abstraction models for information uncertainty

that are designed for visualization. Three abstract uncertainty models are described:

the Unified Uncertainty Model (UUM), which is the simplest of the three; the Dual

Uncertainty Model (DUM), which differentiates between possibility and probability;

and the Quad Uncertainty Model (QUM), which provides the most amount of detail.

These models can be applied recursively, where the degree of certainty for each view

can itself be described by one of the abstract uncertainty models.

6.3.1 The Unified Uncertainty ModelThe advantage of the UUM is that it is simpler and suits the needs of many users.

The disadvantage is that there exists the potential for different uncertainty types to be

confused with one another (e.g. it may be hard to distinguish probabilities and fuzzy

values). This disadvantage is a concern when multiple different types of variables are

mixed in the same visualization.

We define a plural value, which is a modeling technique independent function

f : R→ [0,1] that maps real values to a degree of uncertainty. The visualization system

can focus on providing means for mapping a range of [0..1] to visual elements, rather

than providing methods specific to each type of modeling technique.

6.3 Uncertainty Abstraction Models 89

Our approach is inspired by the description of fuzzy sets, using the membership

function μ , as a generalized form of crisp sets (e.g. [77]). A fuzzy set is defined using

membership function μ: δ = μ(v) where δ is the level of membership ranging from 0

(definitely not a member) to 1 (definitely a member) and v is the candidate value. Thus

the candidate value 28 is half in the fuzzy set long if long(28) = 0.5. This method of

definition can be applied to crisp sets, for example the set A can be defined by:

δ =

{1 v ∈ A

0 otherwise

We expand this reasoning to other uncertainty modeling types. Thus the visualiza-

tion accessible form of information uncertainty is a function:

δ = f (v)

where δ is the degree of certainty ranging from 0 to 1, v is the candidate value, and f ()is the degree of certainty function. Traditional numbers can be considered to be a spe-

cial type of uncertainty modeling technique: the technique specifying total certainty.

The constant c is described by the following uncertainty function:

δ =

{1 v = 2

0 otherwise

δ = 0 indicates impossibility. The true meaning of δ varies with the type of uncertainty

being modeled. For non-specificity types, δ is either 0 (not possible) or 1 (possible).

For membership methods, δ ranges from 0 (definitely not a member) to 1 (definitely a

member). For probabilities and belief, δ ranges from 0 (impossible) to 1 (certainly).

Some visualizations are intended to compare the values of δ with each other. De-

pending on the uncertainty of the variable in question, the range of δ can vary. In these

circumstances it is desirable to normalize δ such that min(δ ) = 0 and max(δ ) = 1. For

example, a probability density function mapped to opacity will often be too transpar-

ent.

6.3.2 The Dual Uncertainty ModelThe DUM views uncertainty information from two different points of view: possibility,

where potential values are given by their possibility to represent the actual value; and

probability, which specifies the likelihood of alternative values or the belief in the

likelihood.

Figure 6.2 show the components of the uncertainty model. At the highest level,


there is a singular representative value that is considered to be a reasonable approxi-

mation. This value is used in instances where the visualization only wishes to present

a single choice. The representative value is most commonly one of the most possible

and most likely values. To conform to the principle of representation, the representa-

tive value should be a valid collapse for the uncertainty model. For both the possibility

and probability views, there may be multiple ranges of values. This is because not all

uncertainty models are convex. For example, a variable might only validly take on two

distinct values. In this case values in between these two are not valid. The degree of

possibility or probability of values can also vary across the alternatives. For example,

the probability of the mean will be higher than other values in a Gaussian probability

distribution.

singular view

representativevalue

is certain

possibility view

set of ranges

probability view

set of ranges

Figure 6.2: Schematic Illustration of the Dual Uncertainty Model

The notation used for DUM consists of the singular value, its Boolean certainty

flag, and a large left pointing angle bracket to the right of which there are two lines. The

top line lists the possibility function and the second line lists the probability function:

a = value

⟨possibility

probability

The degree of certainty specified by the range of possibility and probability functions

must be in [0,1]. The domain is the universe of discourse, which is typically the set of

real numbers.

Common instances of the dual uncertainty model for several different types of in-

formation uncertainty are described next using DUM notation.

The unknown value type is not typically used as a data structure. However, for

completeness we include its DUM representation:

unknownValue = NaN

⟨1

0


Where NaN stands for ’Not a Number’ and should result in an error when used. The

possibility is the full range of all possible values as any value is technically possible.

There is a theoretically even probability that it is any of an infinite number of values,

resulting in an infinite probability that it is not that value when one is chosen at random.

In other words, the probability tends to 0 as the collapse set tends to infinity.

The absolute certainty type is represented as follows:

absoluteCertainty = x

⟨μ(x) = 1, μ(x) = 0

P(x) = 1, P(x) = 0

where x is any value other than x. It is possible to identify absolute certainty because

the collapse set has a single value and both the possibility and probability of that value

are unit. The absolute certainty type is the only type legally able to return a value of

1 for probability. The absolute certainty type can be seen as a degenerate probability

where the probability is exactly 1.

Both the uninformed estimate and uncertainty ignorance types share the same

DUM profile:

unin f ormedEstimate = x

⟨1

0

where x is the estimated value of the variable. These reflect infinite potential to be any

value using the same mechanism as the unknown value type. There is infinite potential

since the value is uncertain, but there is no other description of the uncertainty space.

The classic set based constructions, such as the possibility set, interval, and set of

intervals types have a similar profile:

possibilitySet = x

⟨1 ∀{a : a ∈ s}, 0 otherwise

p ∀{a : a ∈ s}, 0 otherwise

where s is the set and p is a constant that represents the even probability over the

interval. In the case of a discrete possibility set, p = 1|s| . In the case of a set of intervals,

p is even over all intervals.

For the rough set and fuzzy set types there are grades of possibility and the possibil-

ity view is already explicitly defined by the data structure. However, the data structure

does not define the probability of any values and often an even probability is assigned

for all possible collapses:

f uzzySet = x

⟨μp ∀{a : μ(a) > 0}, 0 otherwise

where μ is the membership function.


The probability distribution types explicitly declare the probability, which implic-

itly provides possibility. Any non-zero probability is possible, even though it may be

highly improbable. This can be a problem because many commonly used distribu-

tions are supported on the whole number line. For example, the normal distribution

is asymptotic to zero as the potential value tends to either ∞ or −∞. Therefore imple-

mentations may wish to use a cut plane on the distribution function to eliminate highly

improbably values from consideration. Doing so simplifies the visualization. Thus, if

α is the height of the cut plane and P is the probability distribution function,

probabilityDistribution = x, f alse

⟨1 ∀{a : P(a) > α}, 0 otherwise

P

There is implicit conversion using the DUM model. Accessing the possibility at-

tributes, for example, performs an implicit conversion to a possibility modeling type.

6.3.3 Design and UseIn practice, the plural value is defined as a set of uncertainty ranges and provides an

iterator interface returning ranges. Ranges are either given as a sequence of convex

regions, in the case of a continuous function; or as a sequence of values, in the case of

a discrete function. Each range provides degree of uncertainty information. Figure 6.3

shows a UML diagram for the plural value and uncertainty range types along with

the DUM interface. To support the DUM, all uncertainty modeling technique objects

must implement the IDUMMethod interface. The UUM works similarly, except that it

returns just one plural value.

Figure 6.3: UML Diagram of the Dual Uncertainty Model

Discrete and continuous plural values require different visualization approaches.

For continuous plural values, the iterateByValue and iterateByUncertainty methods


sample along the particular dimension. Therefore the visualization should treat sub-

sequent values as connected. For discrete plural values, the iterateByValue will return

sequential values, which the visualization should treat as distinct. Algorithm 2 is a

generic visualization algorithm to handle plural values.

Algorithm 2 Plural Value Plotfor each r in range do

if r.type is DISCRETE thenfor each u in r.iterateByValue do

plot u distinctlyend for

elsefor every n u in r.iterateByValue do

plot u connectedlyend for

end ifend for

In the case of UUM, there will only be one way to visualize the information. How-

ever, in the case of DUM, the user will need to specify the view that they are interested

in when building the visualization. We now consider how the plural value is used to

extend three common visualization techniques: the cluster plot, the line graph, and the

parallel plot.

The cluster plot is an information visualization technique that plots multiple values

on the same axis. In the traditional cluster plot, each unit of data occupies a single

point. The uncertainty cluster plot instead samples the possible values over the axis,

usually using opacity to indicate the degree of certainty. This is achieved using Algo-

rithm 2. In the case of an interval, the degree of certainty will be one over the range,

resulting in a line for a 1-dimensional cluster plot.

A line graph is built from one or more series of values. It is a 2-dimensional graph:

the value plotted against one axis, and the position within the series against the other

axis. These points are then connected by a sequence of line segments. Extending the

line graph to show uncertainty requires that every value be a plot of the uncertainty

space, all of which is then connected to the next value plot in the series. In the case

of continuous plural values, this results in a polygon (for example, Figure 4.7). The

surface of the polygon can be textured by a sampling of the degrees of certainty over

the range using graphics hardware (for example, Figure 3.1 (d)).

Parallel plots plot multiple dimensions individually in sequence and use line seg-

ments to join points belonging to the same data unit together. There are two common

approaches to uncertainty in parallel plots. The first uses blurring or opacity to indi-

cate uncertainty [39], the second uses a third dimension [91]. The first technique can


be achieved in much the same way as the line graph given above. However, the second

method is slightly more complicated, as the sampling of the uncertainty now needs to

produce a geometric shape. Thus, instead of sampling the uncertainty space to deter-

mine the texture, the degree of certainty is instead used as a height value. A convex

hull is formed between the height values at this dimension and the next.

6.3.4 Alternative ModelsFor most users the UUM and DUM will provide sufficient granularity over the uncer-

tainty space for visualization. However, several further models can be used that expand

on these. There are two basic approaches to devising uncertainty abstraction models.

The first is to further differentiate the views of uncertainty, using the Quad Uncertainty

Model. The second approach is to apply recursion. We briefly discuss these options

below.

6.3.4.1 The Quad Uncertainty Model

The Quad Uncertainty Model (QUM) offers an even greater level of detail than either

the DUM or UUM. Both the possibilistic and probabilistic views can be further detailed

by separating necessity from possibility and belief from probability. This results in four

uncertainty degree functions: possibility, necessity, plausibility, and belief. Figure 6.4

shows this relationship between the variable, the possibilistic and probabilistic views,

and the four plural values.

variable

(possibility) (probability)

possibility necessity probability belief

Figure 6.4: Illustration of the Quad Uncertainty Model

The QUM may be appropriate for sophisticated uncertainty modeling users who

require the ability to distinguish these views. The mathematical relationship between

the the four views follow. For probability, bel(U) ≤ pl(U) and pl(U) = 1− bel(U)(e.g. Dempster-Shaefer, see [107]). For possibility, nec(U) ≤ pos(U) and nec(U) =1− pos(U).

The disadvantage of detailed uncertainty models like the QUM is that each addi-

tional view of uncertainty requires additonal visualization considerations, which places

a burden on the creators of visualization techniques. The strength of multiple views of

uncertainty is for visualizations that they can be used to show multiple views concur-

rently. For example, Lowe et al. [70] incorporated both belief and plausibility for time

series visualizations.

6.4 Case Study: User-Objectives in Financial Decision Support 95

6.3.4.2 Recursion

The models explored thus far (UUM, DUM, QUM) can be composed recursively. This

is similar to type-2 fuzzy sets [77], where the degree of certainty is itself subject to a

degree of certainty. Figure 6.5 illustrates this for a recursive UUM. For each possible

collapse on the real number line, there is a degree of certainty. Each degree of certainty

is itself only certain to a degree, as indicated by the height and color. Such recursion

can theoretically continue ad infinitum, although it quickly becomes impractical.

02

46

810

0

0.5

10

0.2

0.4

0.6

0.8

real number linedegree of certainty

degr

ee o

f cer

tain

ty o

f deg

ree

of c

erta

inty

Figure 6.5: Example of a Recursive UUM

This type of composition is helpful when dealing with uncertainty modeling tech-

niques that include this information. However, for many of the standard uncertainty

models this information is not available.

Recursive composition of abstract uncertainty models will likely find limited use,

since few real-world problems require this level of detail. However, for application

domains that do adopt this level of detail, the recusive composition offers the same

advantages for visualization as the non-recursive counterparts do for information un-

certainty visualization.

6.4 Case Study: User-Objectives in Financial Decision

SupportIn this case study, we construct visualizations according to user-objectives for a finan-

cial model incorporating uncertainty. We show how the visualizations can give the

investor a clearer understanding of the investment from various perspectives. The fi-

nancial model used is for an investor looking to buy a house and sell it again within the


next twenty years. The investor is hoping to earn rental from the property at all times

throughout possession and is aiming for a substantial appreciation in the value of the

property at the time of sale.

There are numerous variables to be considered by the investor both at the time of

purchase and throughout the life of the investment. These include the purchase price

and deposit amount; the loan and debt interest rate; the investors salary and rental re-

turn from the property; the subsequent effects of taxation (based on depreciation, other

deductions and rental income) and other miscellaneous expenses. There are uncer-

tainties associated with all these variables - both initially and accumulated over time.

Variables such as salary are known with certainty at the time of purchase but become

increasingly uncertain as time progresses.

The case study addresses the following questions:

• How will changes in interest rates affect profitability? (possibility objective)

• How will changes in house prices affect profitability? (possibility objective)

• How will changes in interest rates and house prices affect profitability? (possi-

bility objective)

• What is the most likely profitability resulting from changes in interest rates?

(reliability objective)

• What is the likely profitability resulting from changes in interest rates and house

prices? (reliability objective)

• What is the likelihood of making $100k by the 20th year? (structure discovery

objective)

• What effect do interest rate changes have on profitability over the next 5, 10, 15

and 20 year periods? (relevancy objective)

• In the expected future climate of changing interest rates and house prices, what

is the optimum time to sell the house in order to maximize profits? (relevancy

objective)

How will changes in interest rates affect profitability?

User Objective - Possibility This question requires a possibility objective visualiza-

tion as the user is interested in the possible changes in profitability that would result

from changes in the interest rate.


Knowledge of Uncertainty - Non-specificity The knowledge of uncertainty required

is of the potential range over which interest rates might vary. No other likelihood or

probabilities need to be considered.

Display Technique - Possibility The user will wish to see a display of all outcomes,

where each outcome is equally visible. Thereby the scope of all possible outcomes is

clearly delineated.

Discussion The graph in Figure 6.6 shows the complete range of scenarios superim-

posed on the same axes. The top edge represents the optimistic case where interest

rates fall by 0.5% per annum, whereas the bottom edge represents rising interest rates.

As can be seen from the graph, the impact on profitability ranges substantially. This

graph shows the extent to which changes in the interest rate can affect the overall prof-

itability of the investment.

Figure 6.6: Possible effects of interest rate movements on NPV (2D).

How will changes in house prices affect profitability?


tion as the user is interested in the possible changes in profitability that would result

from changes in the house price.


is of the potential range over which house prices might vary. No other likelihood or

probabilities need to be considered.



clearly marked.


Discussion This question requires a possibility objective similar to Question 1. In

this instance, the model is run for a number of scenarios to produce a discrete set of

forty possibilities. Each scenario is superimposed upon the same axes and the results

are shown in Figure 6.7. Lines are colored by annual property price movements.

The graph shows that the investment is worthwhile for most positive average in-

creases in property value. Bands are marked using color to show bands of house

movement prices. It is important to note that this coloring does not relate to uncer-

tainty and is not connected to the possibility objective. Positive value changes are

colored green, neutral changes are colored gray, and negative changes are colored red.

The same information can be portrayed in three dimensions, as in Figure 6.8. The

negative NPV volume is shaded transparent blue to give a clear separation between

positive and negative NPV values.

How will changes in interest rates and house prices affect profitability?


tion as the user is interested in the possible changes in profitability, only this time from

changes in two dimensions: the interest rate and house prices.


is of the potential range over which interest rates and house prices might might vary.

No other likelihood or probabilities need to be considered.



clearly delineated.

Discussion To answer this question, we introduce the effect of changes in interest

rates. Figure 6.9 extends the previous visualization by adding a wire-frame outline

of a volume. The volume shows the possible NPV for interest rate changes of up to

±0.5% per annum change. The surface inside the frame shows the NPV for interest

rates remaining constant. Note that the scale of the NPV axis has been increased

from $±100k to $±200k. This volume provides the user with an indication of the

extent to which NPV can vary, but many of these possibilities have a low probability

of occurring. The surface inside the volume represents a scenario where interest rates

remain unchanged and is included to provide the user with a reference.


Figure 6.7: Possible effects of house price movements on NPV (2D).

Figure 6.8: Possible effects of house price movements on NPV (3D).

Figure 6.9: Possible effects of house prices and interest rates on NPV.


What is the most likely profitability resulting from changes in interest rates?

User Objective - Reliability This question requires a reliability objective as the user

is interested in the most likely outcomes of NPV arising from changes in interest rates.

Knowledge of Uncertainty - Probability Since NPV is based on interest rates in

this model, this question requires the probability that interest rates change.

Display Technique - Reliability The user will wish to see a display of all outcomes,

where each outcome is visualized according to its associated probability. Thereby the

set of the most likely, or reliable outcomes are highlighted, with less likely outcomes

being less visually obvious.

Discussion The answer to this question requires a reliability visualization, since the

viewer is interested in the most likely outcome. The graph in Figure 6.10 shows

changes in interest rates, with more likely outcomes being more opaque. From this

graph the viewer can determine that the most likely outcome is positive, since most of

the visibly shaded area ends above the zero line.

Figure 6.10: Most likely profitability resulting from changes in interest rates

What is the likely profitability resulting from changes in interest rates and house

prices?

User Objective - Reliability This question requires a reliability objective visualiza-

tion as the user is interested in the certainty outcomes of NPV arising from changes in

interest rates and house prices.

Knowledge of Uncertainty - Probability The knowledge of uncertainty required is

of the probability over which interest rates and house prices can vary.


Display Technique - Reliability The user will wish to see a display of all outcomes,

where each outcome is visualized according to its associated probability. Thereby the

set of the most likely outcomes are highlighted, with less likely outcomes being less

visually obvious.

Discussion The answer to the this question makes use of the probabilities in the data

model and is shown in Figure 6.11. Interest rates are expected to remain constant at

8% while the median long term house price rise is 3% per annum. We construct the

same volume that was outlined in figure 6.9, but map the alpha value to the normalized

probability of the event. The wire-frame outline has also been added to provide con-

text. In addition, the color is red when the NPV is negative and green otherwise, which

aids to distinguish positive NPV. We can determine from this figure that the most likely

outcome is positive, although there is a significant and persistent chance of a negative

result. To aid visibility, positive NPV is mapped to light green, while negative NPV

is strong red. These color choices were arbitrary and can be changed, for example, to

accommodate color-blind viewers.

The strong negative presence in the early years is due to upfront costs being in-

curred.

What is the likelihood of making $100k by the 20th year?

User Objective - Structure Discovery For this question, the user requires the effect

of an uncertain variable (interest rates) and its effect on the NPV to be made explicit.


of the probability over the range in which interest rates can vary.

Display Technique - Structure Discovery In this question, the viewer will wish to

see a display of all outcomes and some measure of their associated certainty. It is

appropriate in this case to include more information about the structure of the uncer-

tainty across the range of outcomes, because the viewer has explicitly asked for such

an understanding.

Discussion The answer to this question requires a structure discovery objective visu-

alization, since the viewer is interested in the degree of uncertainty. The graph shown

in Figure 6.12 maps the degree of uncertainty to color. By observing the color at the

crossing of the $100k and 20 year point, the viewer can determine that the likelihood

is small.


Figure 6.11: Volumetric representation of the most likely effect interest rate changeswill have on NPV.

Figure 6.12: Likelihood of NPV


What effect do interest rate changes have on profitability over the next 5, 10, 15

and 20 year periods?

User Objective - Relevancy This question requires a relevancy objective as the user

is interested in limiting the information returned for changes in profitability from vary-

ing interest rates.


of the probability over the range in which interest rates can vary.

Display Technique - Relevancy The user will wish to see a display of only relevant

outcomes, where each outcome is equally visible. Thereby the scope of all possible

outcomes is reduced to only what is relevant.

Discussion This question is best addressed with a relevancy objective, as the user

is only interested in a limited amount of information that is not intuitively available.

Figure 6.13 shows a graph that summarizes the extent of the effect into five year incre-

ments. The criterion function assigns a relevancy based on the comparing extents for

neighboring years. The reduction process aggregates less relevant information such

that the maximal extents remain. The threshold is set to display four groups. An inter-

active system could allow the user to change the number of groups arbitrarily, or alter

the criterion function.

Figure 6.13: Effect of interest rate changes grouped into 5 year periods

In the expected future climate of changing interest rates and house prices, what

is the optimum time to sell the house in order to maximize profits?

User Objective - Relevancy This question requires a relevancy objective as the user

is interested in highlighting relevant times to sell. Furthermore, the user wishes to

associate the likelihoods of these scenarios.


Knowledge of Uncertainty - Probability The knowledge of uncertainty for this

query is of the probability over which interest rates and house prices can vary.

Display Technique - Relevancy The user will wish to see a display of only relevant

outcomes, i.e. those that highlight the optimal selling time.

Discussion This question suggests a relevancy objective since the user wishes to filter

the information by appropriate times to sell. Figure 6.14 indicates the optimum year to

sell by elevation, with immediate sale at the top and holding the property for the full

20 years at the bottom. The surface is colored according to the optimum selling time

to improve visibility, meaning that color matches elevation. The lower right corner

represents falling interest rates and high property value growth.

The yearly data is assigned a relevance according to the distance between NPV and

a target value. We chose +∞, which represents absolute profit. An alternative target

value could be based on yearly growth rate, such as 1% per annum.

Areas that indicate immediate sale (at time 0) are due to an unattractive investment

proposition (circled A). The plateau at year one represents scenarios where the upfront

costs have been partially recouped, but the investment remains poor over the long term

(circled B). The small plateau at five years is due to the tax effects of depreciation

deductions ending (circled C).

6.5 Case Study: Relevancy Objective in Business Pro-

cess ManagementBusiness Process Management is a field that encompasses management and informa-

tion technology. It includes methods, techniques and tools to design, enact, control,

and analyze operational business processes involving humans, organizations, appli-

cations, documents and other sources of information [121]. Central to this field are

the modeling languages that specify the processes, scheduling, interactions, and other

information. Graphical business process modeling languages are elegant solutions

because the user can visually interpret the process. There are many graphical busi-

ness process modelling techniques, such as YAWL [120]. For a detailed discussion

of graphical modeling languages see [119, pp. 3]. However, as the business process

grows in size, the graphical representation becomes difficult to deal with. This prob-

lem is well known to fields that use graphical languages (see e.g. [12]). While zooming

initially solves the issue of gaining an overview perspective, there is a finite limit to

the amount of zooming that can be performed before information becomes obscured.

For large models to be understood it is necessary that the level of controlled visual

6.5 Case Study: Relevancy Objective in Business Process Management 105

processing required is reduced. Controlled visual processing refers to those activi-

ties that require cognitive functions, such as the interpretation of text. The approach

explored here is to provide views of the specification that exclude less relevant infor-

mation. This filtering of information produces a model with lower complexity, but

thereby introduces a degree of uncertainty. This uncertainty reflects the lower resolu-

tion model’s potential for representing variations of the original model.

The aim is to construct a reduced representation for a given input specification.

Figure 6.15 shows the achitecture of the system. The original graph is filtered accord-

ing to a criterion function by a BPM reduction technique to produce a reduced graph,

which is presented to the user. The user then inspects the graph and alters the param-

eters, completing a feedback loop. Figure 6.16 shows a screen shot of the prototype

system, showing the graphical user interface controls available to the user.

We to build a reduced graph GR(VR,ER), where the R subscript denotes reduction.

The reduced graph is built such that it contains a subset of the nodes of the original

graph. In other words, a reduced graph GR(VR,ER) is built from an original graph

G(V,E), such that VR ⊂ V . A relevance factor, ε , is calculated by εi = C(vi) for each

node vi ∈ V , where C is the criterion function C : V → R and R is the set of real

numbers. C orders the nodes according to their relevance to the task of the user.

Preservation of the overall structure of the graph is achieved through identifying

important control flow nodes. The control perspective defines the flow of control

through the graph. A promising structural importance heuristic is based on the con-

nectedness, χ : V → Z, of the node and its estimated position in the routing hierarchy,

φ : V → Z. Z is the set of integers and φ is calculated by counting the number of splits

and subtracting the number of joins on the shortest path from s to the node, excluding

this node. χ is simply the sum of all connected nodes to this node. ε is calculated as

follows:

εi =χ(vi)

min(φ(vi),1)

Another approach is to rank nodes according to a text retrieval algorithm. To im-

prove context, we introduce a notion of relevance flow, which increases the relevance

of nearby nodes. The amount of the contribution drops off with distance traveled,

including loops. The amount of the drop off is arbitrary and a constant rate, β , gives

adequate results. This algorithm to assign relevance factors based on a text search term

for graph G(V,E) is given by Algorithm 3.

Once the degree of relevancy has been determined, the business process specifi-

cation needs to be reduced. The reduced graph must preserve the semantics of the

original graph to avoid being misleading. Semantics are preserved if all possible or-

ders of execution of the remaining nodes are unchanged from the original graph.


Figure 6.14: Optimum time to sell the property under different economic conditions.

OriginalGraph

1CriterionFunction

2BPM

Reduction

ReducedGraph

3Presentation

User

Figure 6.15: Architecture of the case study system

Figure 6.16: YAWL query: Prototype tool for the graphical business specificationreduction


Algorithm 3 Text Retrieval Relevancy in Business Process Specifications* Find ST , the set of all nodes that contain the search term.

foreach v ∈ ST

* Initialize the contribution value, c← 1.

* Initialize the neighborhood node set, SN ←{v}.while c > 0 and SN = /0,

* update ε for all neighbors: ε ′(n)← ε(n)+ c for all n ∈ SN .

* reduce future contributions: c′ ← c−β .

* update neighbor list: SN ←{n ∈ SN : w ∈V,{nw} ∈ E}.endwhile

endfor

Two methods are described: the collapse method, which incrementally reduces the

graph until a threshold value for ε is reached, and the decimation method, which re-

moves all nodes below a threshold value and reconstructs the paths between remaining

nodes. The threshold value is assigned by the user and is called the alpha-cut value,

denoted α .

The principle behind the collapse technique is to incrementally reduce the graph

using the production rules shown in Figure 6.17. Each incremental change to the graph

is selected on the basis of removing the least relevant node (minimum ε) from the

current model GRn to produce the next GR

n+1. The following conditions must be

met to ensure well-formed results. A non-join node is selected for removal at each

increment. A split node is only selected if its predecessor is a task. The removal is

performed by merging the node with its predecessor. Split and join decorators are

removed from a node when a single inflow or outflow, respectively, results from the

collapse. Removing the split and join decorators yields a sequence operation.

One advantage of the collapse technique is that the order of collapses can be stored.

The inverse operation of a collapse, called a node-split, can then be performed to re-

store GRn+1 to GR

n. This is similar to progressive meshes [43] in computer graphics.

Another advantage is that since collapses relate one level of detail to another, the pre-

sentation can animate changes to increase interpretability of the technique. The cal-

culation of collapses can be performed in a pre-processing step and since the actual

collapse operation requires minimal processing, the visualization system can allow in-

teractive navigation between various levels of detail.

The decimation approach selects a number of nodes that will be included in GR. All

other nodes are removed. The original graph is then analyzed to reconstruct the paths


Pattern Original YAWL Reduced (introduce ε)

Sequence

Selection

Parallel

Multi-choice

Iteration

Figure 6.17: Production rules for reducing a YAWL graph


between the remaining nodes. Nodes are selected for inclusion if their relevance is αor higher. A concurrent path is defined as any path from one node to another where a

split exists on the path that was not synchronized before reaching the destination node.

A direct path from x ∈ S to y ∈ S is a path from x to y without going through any other

element of S. The decimation-construction algorithm is given as follows:

Algorithm 4 Business Process Specification Decimation Algorithm* Initialize the set of included nodes, SI ←{s, t}* add all vi where C(vi) > α to SI .

* Initialize output edges, ER← /0

for x ∈ SI,y ∈ SI,x = y

* V ′R←VR∪{y}if there is a direct path from y to y

* E ′R← ER∪{yy}endif

if there is a direct path from x to y

* E ′R← ER∪{xy}endif

if a concurrent path {x..y} includes any z ∈ SI (z = x = y)

* add the offending split node(s) before x and y to SI

* add the matching join node(s) after x and y to SI

endif

endfor

A large business process is shown in Figure 6.18. This graph represents a simpli-

fied version of an actual business process used by an insurance company. Figure 6.19

shows the same graph after a significant collapse-based general simplification. This

reduced graph shows just 13% (6/44) of the nodes, but preserves important structural

features. Animating from the original to the reduced specification aided comprehen-

sion. Figure 6.20 shows a graph that was reduced according to relevancy to the text

“legal” and using the decimation technique. From this graph it is possible to observe

the relationship between relevant nodes.


Figure 6.18: Original graph prior to simplification


Figure 6.19: Reduced specification using collapse approach (α = 2.5)

Figure 6.20: Reduced specification for text query “legal” using decimation approach(β = 0.5, α = 1)

CHAPTER 7

Integration of Core Features

7.1 IntroductionThe previous chapters have described the ingredients for managing uncertainty in mod-

eling and visualization. This chapter discusses the architecture that integrates these

ingredients into a functional whole. The result is a coherent and integrated platform

for information uncertainty modeling and visualization.

There are a number of questions that need to be addressed when designing an

architecture to integrate the ingredients. Which components are responsible for the

storing the uncertainty details and how can these be extended? How can users create

mappings between the abstraction models and visual elements? This chapter presents

a design and architecture that answers these questions.

We have developed a prototype system called IvySheet1 that implements the ar-

chitecture presented in this chapter. IvySheet is built using Sun Java Standard Edition

5.0 [1] and has been successfully tested on Microsoft Windows XP SP2, Ubuntu Linux

7.04, and Apple OSX 10.5 (Tiger). It was used for the case studies in Sections 4.6 and

6.4, and the studies in Chapter 9. An extension to IvySheet called GPGPUSheet was

also developed, which is described in Section 8.4. This extension demonstrates the

extensibility of the architecture described here.

This chapter is organized as follows. Section 7.2 discusses design considerations.

Section 7.3 then presents the architecture for an integrated system with illustrations

from the IvySheet prototype.

1The name IvySheet is derived from “Information Visualization Spreadsheet”

114 Chapter 7. Integration of Core Features

7.2 Design ConsiderationsThis section describes the design of the system. The aim of the design is to be ex-

tensible, flexible, and intuitive to use. Users are able to model and visualize their in-

formation within the same system and it automatically propagates uncertainty details

throughout the model, including to the visualizations. Uncertainty details are protected

against becoming separated, which reduces common user mistakes. Changing uncer-

tainty modeling techniques is a simple process, requiring only a change to a single

field.

The user interface is designed to be familiar to users of current commercial spread-

sheet systems. The design for the user interface is shown in Figure 7.1. The menu bar

provides access to various commands, such as renaming or saving the current spread-

sheet. The text interface area beneath the menu bar is used to enter and edit cell con-

tents. These are entered as text strings, which are then parsed and converted to a cell

of the appropriate type. A workbook contains several sheets and these are accessed

by tabs bearing their name. The scrollable sheet view allows users to interact with the

currently active sheet. When one of the sheet selection tabs is clicked, it brings the

appropriate sheet into view. The currently active cell is indicated by a cursor, which

users control using either the mouse or the keyboard.

File Sheet Cell Plugins

Tab 3Tab 2Tab 1

Menu barText interface area

Sheet selection

Scrollable sheet view

Currently active cell

Figure 7.1: Design of the User Interface

The sheet view consists of a matrix of cells. The cells are addressable using letters

(A..Z, then AA..ZZ, and so on) for the column and a number for the row. For example,

the top left cell is “A1” and the cell in the third column of the 50th row is “C50”. Each

cell can contain data, including any associated uncertainty information. The drawing

of a cell in this view is handled by the cell type object, which is also responsible for

storing the information.

Internally, the system consists of four core components and an unlimited number

of plug-in components. The main core component is the Kernel, which forms the inter-

face between all other components. This relationship is illustrated in Figure 7.2. The

other core components are the Language component, which is responsible for formula

processing; the Dependency component, which manages a graph of cell dependencies

7.2 Design Considerations 115

that arise as a result of formula use; and the Spreadsheet Datastructure, which holds the

cell objects in an addressable matrix data structure. The Language component includes

the uncertainty propagation models and uncertainty abstraction models.

Spreadsheet Datastructure

DependencyComponent

KernelLanguage

Component

Plugins

Figure 7.2: The Spreadsheet Architecture

There are three main tasks that the user performs:

1. Selecting the currently active cell

2. Altering the currently active cell, either by:

(a) Editing a textual description,

(b) Using a custom editing tool supplied by a plugin,

(c) Deleting the contents of the currently active cell, or

(d) Using Cut/Copy/Paste to remove/move/copy the currently active cell

3. Adding/Removing/Altering the sheets in the Spreadsheet Datastructure

When the contents of a cell changes there are two tasks that the system performs.

First, if it is a textual edit, then the new text is parsed by the cell types to build a new

cell object. Next, the dependency component is notified of the change, which causes

affected cells to be recalculated.

All of the core components are required for the system to function. In contrast,

plug-ins are optional and can be loaded at any time to build up the functionality of

the spreadsheet. Plug-ins are used to introduce new cell types, language functions,

uncertainty propagation and abstraction models, and visual elements. Additionally,

plug-ins can add items to the menu, referred to as user aids. For example, saving,

loading, cut, and paste operations are handled by user aids.

The advantage of this design is that it is flexible, extensible, and intuitive. The

visualization sheet provides a flexible mechanism for mapping information to visual

elements, enabling users to explore using all of the benefits that the formula language


provides. This approach keeps the interface consistent and brings the power of formu-

lae to the visual mapping process. The system is built around extensibility. All non-

essential functions are provided by plug-ins, including all of the features required to

provide new uncertainty modeling techniques. Finally, the operation of the spreadsheet

is designed to be consistent with the existing spreadsheet paradigm. This increases the

intuitiveness for new users as much of their existing knowledge can be applied.

7.3 ArchitectureThis section details the architecture of IvySheet, which follows the design given in the

previous section. The overall system can be divided into three main parts2: the core

components, the user interface, and the plugin components. The core components pro-

vide the essential spreadsheet infrastructure; the user interface provides an interface

for the user to interact with the system; and the plugin components provide function-

ality for cell data types, formula operations, etc. Figure 7.3 illustrates this high-level

Architecture using Unified Modeling Language (UML) notation [8].

user.interface

core

plugins

Figure 7.3: High-level Architecture

7.3.1 User InterfaceThe user interface consists of the main window and the scrollable spreadsheet view.

The main window contains the text field and sheet selectors. The scrollable spreadsheet

view is a sub-window of the main window and displays the ruler and grid of cells. It

consists of two classes, the display class and the controller class. The display class

is used to manage the window output, while the controller class is responsible for

responding to user input. This follows the Model-View-Controller architecture that is

used by Java’s Swing3.

2In keeping with Java naming conventions [1], the names of these components are in lower case.3Java Swing is the user interface library that IvySheet uses

7.3 Architecture 117

(a) View (b) Controller

Figure 7.4: View and Controller Classes for the Spreadsheet

Figure 7.4 show the UML diagrams for the view and controller classes that manage

the spreadsheet view. The UISpreadSheet class is only responsible for drawing the

grid and rulers. The actual cell contents are drawn by the cell type classes themselves,

which are added by plug-ins.

7.3.2 Core ComponentsThe core components consist of the kernel object and three main components: the data

model component; the dependency component; and the language component. The

kernel is responsible for loading and managing the plugin components.

Figure 7.5 shows a UML diagram of the core components. The datamodel com-

ponent holds the spreadsheet data structures. The information managed by this com-

ponent completely describes the current spreadsheet contents. The dependency com-

ponent manages the dependencies between cells, which are ordinarily created by for-

mulae. Cells are notified of changes to their dependencies, which gives them a chance

to update themselves. The dependency graph can be generated from the information

contained in the data model and therefore does not need to be stored to disk. The

language component manages the parsing and execution of formulae, look up of the

appropriate methods in the propagation models, and the cell addressing scheme. Each

of these components are described next.

7.3.2.1 The Kernel

There is only one Kernel object per running instance of the spreadsheet system. It

maintains a list of the available cell types and provides the interface for registering


core

datamodel dependency language

Kernel

Figure 7.5: The Core Components

new cell types. It is accessed through two different interfaces: the IKernel interface

provides the standard communications functions that most components use, while the

IKernelRegister interface is only used when registering new plugins. The IKernelReg-

ister extends the IKernel interface and therefore is a superset, as shown in Figure 7.6.

Figure 7.6: UML Inheritance Diagram for the Kernel Class

The IKernel interface is used by running components. The IKernelRegistration

interface is only used when registering new plugin components. It grants access to

methods that register new features of the system. Kernel is the actual kernel class. It

keeps a list of the novel cell types sorted in order of parsing priority. Cell types with a

higher parsing priority are checked first when parsing a string.


7.3.2.2 The Datamodel Component

The Workbook holds the cells of the spreadsheet. This forms all of the data necessary

for persistent storage. Since the dependency graph can be imputed from the formulae,

it is not stored by the spreadsheet data-structure and is instead held by the dependency

component. The data structure is as follows:

Workbook = {SheetName→ Sheet}Sheet = X×Y →Cell

Cell = NIL |CellType

where CellType is the base class for all cell objects. The Sheets are sparse, meaning

that they can contain empty cells.

Figure 7.7 shows the main classes in the datamodel component. These are the

Workbook, which contains multiple SpreadSheets, which contain an array of CellCon-

tainers. CellContainers hold three optional units of information: the cell contents, the

overridden attributes, and a runtime reference to the dependency graph node. If the

contents is nil, then this cell is empty. However, it may still have formatting attributes

applied to it, such as borders or shading.

Figure 7.7: Main Classes in the Datamodel Component

7.3.2.3 The Dependency Component

Functional relationships between cells are created by formulae. When a cell is updated,

then all dependent cells must also be updated. The dependency component manages

these relationships using a two-way linked graph. The downstream links point to cells


that are dependent on this cell, while the upstream links point to cells that this cell

is dependent on. Thus, downstream links only exist for a cell if it is referenced in a

formula elsewhere and the upstream links only exist for a cell if it contains a formula.

Figure 7.8 illustrates the relationship between the dependency graph, the cells, and

the cell containers of the spreadsheet. The dependency graph nodes are depicted as

octagons and have three members: their upstream nodes, which are those that this node

depends on; their downstream nodes, which are the nodes that depend on this one; and

the listener, which is an object that is notified when dependencies change. The cell

container in the datamodel component provides links from cells to their dependency

graph nodes.

Dependency Graph

upstream

downstream

listener

cell containercell

contents

dependency

Figure 7.8: Relationship of the Dependency Graph to Cells and CellContainers

When a cell is changed in the data model, it notifies the dependency node. The

dependency node then uses an algorithm similar to mark and sweep [97] to notify all

of its downstream nodes. Mark and sweep is an algorithm used in garbage collectors

to deal with issues such as circular links. While circular references are disallowed in

IvySheet, a common pattern is to have diamond shaped dependencies, as illustrated

in Figure 7.9. Cell C is dependent on B1 and B2, both of which are dependent on

A. Under a naive implementation, a change to A would result in C being recalculated

twice. The mark and sweep approach avoids this by ensuring that nodes are only

recalculated once.

A

B1 B2

C

Figure 7.9: Cell C is Dependent on A Multiple Times

The mark and sweep algorithm operates as follows. All downstream nodes and

their dependencies are “marked”, by setting a special flag on the node. Then all nodes

are “swept” in a three step process:

1. If any upstream nodes are marked, then they are swept first


2. The listener for the current node is notified and then the current node is unmarked

3. Sweep all marked downstream nodes.

As an example, consider a change to cell A from Figure 7.9. It is marked dirty and

then recursively marks its downstream neighbors. B1 is the first neighbor, which is

marked dirty and recurses down to C. C is marked but has no downstream neighbors

and returns back to B1. B1 has no other downstream neighbors and returns back to A.

A’s next downstream neighbor is B2, which is marked dirty. B2’s downstream neigh-

bor, C, is already marked and therefore not marked again. Since both B2 and A have

no other downstream neightbors, the marking process is complete and the sweeping

process begins. A has no upstream neighbors and is therefore recalculated and un-

marked. A’s downstream neighbor B1 is swept next. B1’s only upstream neighbor is

unmarked, thus B1 is recalculated and then unmarked. B1’s downstream neighbor C

is next. However, C cannot be recalculated yet as one of its upstream neighbors, B2,

is marked. Therefore, B2 is swept first. B2’s upstream neighbor, A, is unmarked and

so B2 can be recalculated and unmarked. It is now possible for C to be recalculated

and unmarked. C returns to B1, which returns to A. A’s next neighbor, B2, is already

unmarked and is therefore not swept. This simple example demonstrates that all nodes

are only recalculated once.

There is a single dependency graph object for the workbook. Figure 7.10 shows

the inheritance diagram for the DependencyGraph class. Other components inter-

act with the dependency graph through the IDependencyGraph interface. Formula-

calculated cells trigger the kernel to call addDependencies() to register their dependen-

cies. When a cell is deleted, clearCellDependency() is called. When a cell changes, no-

tifyCellChanged() ensures that dependent cells are updated. When loading a workbook

from disk, the notification system is turned off using setEnableFiring() for performance

reasons. Once the file has been loaded, the system is re-enabled and recalculateAll() is

used to ensure all cells are uptodate.

Figure 7.10: UML Inheritance Diagram for the DependencyGraph Class


7.3.2.4 The Language Component

The language component is responsible for formula execution. Formulae form the

logic of the spreadsheet by creating functional relationships between cells.

There are two types of operators: prefix operators and infix operators. Prefix oper-

ators are so named because the operator name comes first, for example. “add(1, 2)” is a

prefix operator. For infix operators, the operator appears in between the two operands,

for e.g. “1 + 2” is an infix operator.

When a formula is parsed, it returns a CodeTree. A CodeTree is an n-ary tree data

structure that can be walked to gain a list of the cell references. Algorithm 5 performs

a post-order traversal of the tree to collect cell references. These are entered into the

dependency graph. The CodeTree is a collection of CodeNodes, with one designated

as the root:

CodeTree = {CodeNode[], ∧root}CodeNode = Lea f Node |{ f unctionname, CodeNode[]}Lea f Node = f unctionname |constant |CellRe f erence

Algorithm 5 Collect-Cell-References( x )cellrefs[] = /0for all y in children do

cellrefs[] += Collect-Cell-References( y )end forif x is a CellRef or CellRange then

cellrefs[] += xend ifreturn cellrefs[]

A node either holds a function name, a constant value, or a cell reference. The tree

is evaluated from the bottom up, with child nodes forming the parameters to the func-

tion. Therefore nodes that contain constants or CellReferences are leaf nodes, which

do not have any children. If a leaf node contains a function, then that function has no

parameters, such as “getCurrentDate()”. The algorithm listed in Algorithm 6 is used

to evaluate a formula. It performs a post-order traversal of the tree to ensure the cor-

rect order of execution. Figure 7.11 shows an example of a CodeTree for the formula

“=5*A1+B7”. Rounded squares represent operations, circles represent constants, and

diamonds represent cell references.

A formula cannot produce a circular reference. In other words, it cannot depend

on a cell that depends on the result of this formula. This is because it would cause


Algorithm 6 Evaluate-CodeTree( x )if x contains a constant then

return new cell from stringelse if x contains a cell reference then

return referenced cellelse

parameters[] = /0for all y in children do

parameters[] += Evaluate-CodeTree( y )end forreturn func( parameters[] )

end if

Add

Mul B7

A15

= 5 * A1 + B7= (5 * A1) + B7= Add( Mul(5, A1), B7)

Figure 7.11: Example Formula and its CodeTree

an infinite loop. Attempts to create a circular reference should be detected and treated

as an error. The formula language definition is given in Extended Backus-Naur Form

(EBNF) in Figure 7.12.

Formula = Element | AttributeElement = Const | CellRef | CellRange | OpAttribute = ( Element ’.’ attribute-name ) | ( Element ’(’ attribute-name ’)’ )Const = string | numberCellRef = [ sheet-name ’!’ ] column rowCellRange = CellRef ’:’ CellRefOp = PrefixOp | InfixOpInfixOp = Formula ( ’+’ | ’-’ | ’*’ | ’/’ ) FormulaPrefixOp = funcname ’(’ Formula ’,’ Formula ’)’

Figure 7.12: Formula Language Definition

Figure 7.13 shows the public methods for the CodeTree class. The constructor

takes a formula string as its parameter, which it parses to produce the tree nodes.

The evaluate() method results in a cell object and is called to evaluate the CodeTree.

getAsString() converts the CodeTree back into string form. This is used to provide an

editable string representation to the user. The getDependencies() method returns a list

of cells that this formula depends on, which is used to update the dependency graph.


Figure 7.13: UML Diagram for the CodeTree Class

The language component manages the uncertainty propagation model. Figure 7.14

shows the model classes. The PropagationModel class manages a list of propaga-

tion methods. The PropagationModelSet combines multiple PropagationModel ob-

jects and presents them as a single list. Figure 7.15 shows the IPropagationMethod

interface, which represents a propagation method. The sole purpose of a propagation

method is to take a list of parameter cells and return the resulting cell.

Figure 7.14: UML Inheritance Diagram for the Propagation Models

Figure 7.15: The IPropagationMethod Class

7.3.3 Plugin ComponentsThe plugin components are used to build the functionality of the spreadsheet system on

the foundation provided by the core components. They are responsible for adding cell

types, formula operations and propagation methods, visual elements, and menu items

(user aids). A plugin class is required to manage the loading process. It implements the

IPlugin interface and uses the methods provided by the IKernelRegistration interface

of the kernel object. Figure 7.16 shows the methods of the IPlugin interface. The


register() method returns a short human readable name string and the getAboutString()

returns a detailed description.

Figure 7.16: The IPlugin Interface

One of the principal uses for Plugins is to add cell types to the system. Cell type

objects are responsible for managing the storage and display of cell contents. All cell

type classes must implement the ICellType interface, which is shown in Figure 7.17.

The kernel object invokes the isValidString() method whenever users enter or edit the

cell contents. If the method returns true, the buildFromString() method will be invoked.

Otherwise, the kernel will repeat this test with the other cell types. The getPriority()

method is used to determine the order in which isValidString() is called. A cell type

with a higher priority will be called first.

Figure 7.17: The ICellType Interface

The cell type objects are responsible for drawing the contents of their cell, which

is performed by the drawContents() method. The getEditableString() method returns

a string that users can edit. This string is displayed in the text edit field in IvySheet.

getTypeDescription() returns a human readable short name of the type of data that the

object handles. There are two interfaces that a cell type can implement to specify

special behavior. These are IIndirectCell and ISpecialEditCell.

IIndirectCell is an interface that is used by cell types that wish to return another cell

type when referenced in formulae. Formula cells are themselves an example of

an indirect cell, where the contents of the cell is a formula, but the referenced

value should be the result of the formula. There is a single method that returns

the indirect cell contents.

ISpecialEditCell can be used to provide a dialog for editing the cell contents. This

enables custom user interfaces to be developed to aid users with editing uncer-

tainty details. It consists of a single method that returns a boolean flag indicating

whether or not the contents were successfully changed.


The plugins also add uncertainty abstraction model handlers. Support for the DUM

was built into IvySheet, which is provided by the IDUMMethod interface shown in

Figure 7.18. Both the possibilitic and probabilistic views return an UncertaintyRange-

Set object, the details of which are shown in Figure 7.19. The UncertaintyRangeSet

is a set of UncertaintyRanges. An uncertainty range describes the uncertainty for a

range of potential collapses. This simplifies programming tasks for developers of vi-

sual elements, since they only need to consider the values that are in the uncertainty

range set. For convex data there will only be one uncertainty range in the set. Visual

elements can be designed to make use of the UncertaintyRangeSet, thereby making

them independent of the particular cell type that they are passed.

Figure 7.18: The IDUMMethod Interface

Figure 7.19: The UncertaintyRange and UncertaintyRangeSet Classes

The visual element objects represent nodes in a scene graph and implement the IVi-

sualElement interface shown in Figure 7.20. The build() method is invoked whenever

changes are made to the layout of the visualization sheet that affect this element. It is

passed an array of cells that are on the same row of the visualization sheet. These cells

typically include formulae that source information from elsewhere in the workbook

and transform it into an appropriate form. When any of the dependent cells change, the

visual element is notified through the inherited notifyDependenciesChanged() method.

7.4 Summary 127

The getTopNode() method returns the scene graph node that is managed by this visual

element.

Figure 7.20: UML Inheritance Diagram for the IVisualElement Interface

The final type of extension that the plugins can provide are user aids. These are

items that appear in the menus, such as saving the workbook to disk. The Action

interface is used to add menu items in Java and user aids must implement this interface.

7.4 SummaryThis chapter presented the design and architecture for a system for the modeling and

visualization of information uncertainty. This architecture integrates the ingredients

of Chapters 3-6 together into a coherent and extensible whole. The user experience

is designed to be familiar to spreadsheet users, which reduces training requirements.

Visualization is facilitated by a visualization sheet, which enables users to use formulae

to map information to visual element parameters. The essential infrastructure of the

spreadsheet system is provided by four core components: the kernel, the language

component, the dependency component, and the spreadsheet data structure component.

Usable functionality is provided by plug-ins, which are responsible for uncertainty

encapsulated cell contents, uncertainty abstraction, and visual elements.

CHAPTER 8

Advanced Features and Extensibility

8.1 IntroductionThe core features together provide a fully functional and integrated information uncer-

tainty modeling and visualization system. This chapter investigates advanced features

and extensibility. The advanced features enable users to work at a higher level and with

greater control over its operation. The extensibility enables the system to incorporate

new uncertainty modeling and visualization techniques.

One issue with spreadsheet systems is that large spreadsheets can become difficult

to understand. Problems can often be approached hierarchically by breaking large parts

into smaller sub-parts. To address this mode of operation we introduce the hierarchical

spreadsheet, which enables sheets to be embedded within other sheets, forming a tree

of sheets.

The visualization spreadsheet offers fine grained control over the structure of a

visualization. However, this approach differs from commercial spreadsheet systems,

which offer less flexible more targeted visualization techniques [26]. We offer embed-

ded visualizations as an approach that is similar to the present commercial offerings.

Embedded visualizations represent specific visualization techniques that are placed

within a cell. Floating observers are windows that display a cell in a floating window,

which enables them to remain on screen while users navigate around the spreadsheet.

To accomodate user- and domain-specific needs several parts of the framework

may require customization. For example, these customizations include the ability to

select alternative uncertainty propagation models. Furthermore, uncertainty modeling

and visualization techniques continue to appear. For this reason the system needs to

130 Chapter 8. Advanced Features and Extensibility

be extensible, enabling new data types, propagation methods, visual elements, and

purpose specific formula functions to be added.

This chapter is organized as follows. Section 8.2 describes the advanced features

of the system, including hierarchical spreadsheets, floating observers and embedded

visualizations, and customization options. Section 8.3 covers extensibility and lists the

steps needed to add a new information uncertainty modeling technique. Section 8.4

contains a small case study that illustrates how the extensibility of the system can be

exploited to provide a tool to aid GPGPU algorithm development.

8.2 Advanced FeaturesThis section describes three advanced features of the system in the following order.

First, the hierarchical spreadsheets feature, then the embeded visualizations and float-

ing observers, and finally, the customization capabilities are explored.

8.2.1 Hierarchical SpreadsheetsThe term hierarchical spreadsheet is used here to refer to spreadsheets arranged in a

tree structure. Such a structure can be useful, since lower-level spreadsheets can be

used to provide greater levels of detail, while higher-level concepts can be explored

in the higher-level sheets. Particularly, high-level planning can be conducted, leaving

details to be defined later in lower-level sheets. There are two major issues that need

to be solved in order to support a hierarchical structure. The first is how to build the

structure in the first place, and the second is how to pass information between parent

and child spreadsheets.

The first issue can easily be addressed by generalizing the encapsulation approach

by embedding a sheet within the cell of another sheet. Using this method, the hierarchi-

cal structure is implicit, because the parent spreadsheet contains the child spreadsheet.

Prior work, such as the ASP [95], mention the capability of embedding a spread-

sheet within a cell. However, they do not go on to solve the second issue of how to

exchange necessary information with the parent sheet. The principal problem with

their approach is that the cell contains a spreadsheet object but it is not specified how

to interface with such an object. Either a formula must include knowledge of the inter-

nal working of the spreadsheet, or the spreadsheet object must have generic methods

for interrogation, or both.

If the child spreadsheet is viewed as a function that returns a value, the problem

becomes simpler and more intuitive. It is simpler because we only need to provide

interrogation for the return value. It is more intuitive because this fits with the spread-

sheet paradigm: formulae are functions that return a single value. This capability can

8.2 Advanced Features 131

be provided by generalizing further components of the thesis: the representative value

of the spreadsheet is the return value. One embodiment of this approach is to make the

upper left cell (at address A1) the representative value.

Spreadsheets in general rely on a pull paradigm, where information is pulled in as

required1. This “pull” mechanism is achieved using cell references in a formula. Fur-

ther, commercial spreadsheet packages support the pulling of information from cells

in other sheets of the same workbook. For example, Microsoft Excel uses the excla-

mation mark to signify a sheet name prefix to a cell addresses [26]. However, this was

not designed for hierarchical spreadsheets and only allows for absolute addressing.

Computer file systems have long supported a hierarchical arrangement with both abso-

lute and relative addressing. By combining the address prefix scheme of commercial

spreadsheets with the relative addressing scheme of file systems, the child spreadsheet

can pull information from parent-, sibling-, and child spreadsheets in a flexible manner.

Thus far the editing of novel cell types is typically done through the text field. An

embedded spreadsheet can be defined by text; for example, as a map from addresses

to contents and attributes. However, it is not intuitive to edit this text based definition

as a means for updating the spreadsheet. Therefore, it is desirable to allow the user to

edit the spreadsheet using the spreadsheet view.

8.2.1.1 Prototype System

Figure 8.1 shows a prototype implementation of the hierarchical spreadsheet extension.

A new cell type is added, called the SpreadsheetCell. Adding a SpreadsheetCell creates

a new spreadsheet in the workbook. The unique name of the embedded sheet uses the

parent spreadsheet name as a prefix with an exclamation mark as the separator. This

strategy can be employed recursively using the exclamation marks to separate each

sheet name. The traditional approach allows users to select any sheet from a linear list

of sheets. A more intuitive means for selecting sheets in a hierarchical structure is to

use a tree control.

Table 8.1 compares the SpreadsheetCell to an IntervalCell. Both cell types store

more than a single value: The IntervalCell encapsulates the middle number and the

margin of error; while the SpreadsheetCell links to another sheet in the workbook. The

representative value for the SpreadsheetCell is its upper left hand cell. It is possible

for that cell to itself contain a SpreadsheetCell, allowing arbitrary depth. The editing

interface for the interval is to use the text edit field at the top of the window. Such a

text interface is unintuitive for editing an entire spreadsheet. Therefore, when the user

wishes to edit the contents, the embedded spreadsheet is displayed. To insert a sheet at

the current cell, the user can type “Spreadsheet(name)” into the text edit field, where

1Some researchers have experimented with push operations in spreadsheets, e.g. [65]


Figure 8.1: Hierarchical Spreadsheet Prototype. The Parent Sheet (Left) Contains theChild Sheet (Right)

name is the uniquely named suffix of the new sheet.

Category IntervalCell SpreadsheetCell

Information stored Middle number, errorbounds

Reference to the childspreadsheet

Representative value Middle number Representative value of thecell at A1 of the child sheet

Editing interface Text edit field Jump to the spreadsheet

Table 8.1: Comparison of IntervalCell and SpreadsheetCell

The formula language is also updated to allow relative addressing. Table 8.2 il-

lustrates four examples of cell addresses and their interpretation within the addressing

scheme. The first two are absolute addresses and can be found in most commercial

spreadsheet programs. The second two are relative addresses, incorporating concepts

from file paths.

example description

B7 Cell in column B, row 7, of the current sheetJune!B7 Cell in column B, row 7, of the sheet named June.!June!B7 Cell in column B, row 7, of the child sheet named June of the

current sheet!..!June!B7 Cell in column B, row 7, of the sheet named June that is a child

of this sheet’s parent (i.e. a sibling sheet)

Table 8.2: Examples from the Prototype Addressing Scheme

This addressing scheme can be used to provide named fields. Named fields are

cells that are referenced using a human readable name rather than their row and column

address. Embedding a spreadsheet with a specific name and then referring to the A1

cell of that sheet using relative addressing achieves the equivalent of a named field.

This is made more intuitive if the language component automatically appends A1 to


relative addresses that end with an exclamation mark. An example is a spreadsheet

that contains an embedded sheet named “Tax”. Cells within the main sheet can refer

to the return value of the child sheet using “!Tax!”. An example formula could read

“= B7 * !Tax!”, indicating that the result should be the result of the cell in B7 multiplied

by the named field “Tax”.

8.2.1.2 Advantages

The addition of hierarchical spreadsheets feature makes it possible to work at multiple

levels. For example, high level planning can occur in the main spreadsheet.

Planning and Problem Solving Using a hierarchical spreadsheet system allows the

user to delay the modeling of details. A place holder value can be used during the

planning phase. The user can then return to the place holder later and replace it with

an embedded spreadsheet that contains the details. This process can be applied re-

cursively, progressively refining details of the user’s data model. The detail sheets can

incorporate sub-calculations and metadata. For example, they may contain annotations

describing rationale, measurement procedures, or other relevant information.

More Complex Problems As the size of a spreadsheet grows, the user typically sep-

arates related modular parts into individual sheets. This temporarily helps to manage

the complexity, however, the list of sheets is linear and it is up to the user to maintain a

mental map of the structure. At greater levels of complexity, the mental effort required

by the user to navigate and understand the spreadsheet becomes increasingly difficult.

The hierarchical spreadsheet structure more naturally represents typical decomposition

of problems.

Summarization and Abstraction There are many problems for which spreadsheets

are used that naturally lend themselves to a hierarchical structure. For example, chrono-

logical problems are usually broken down into hierarchical time units. Years can be

broken down into months, which break down to weeks, and so on.

8.2.2 Floating Observers and Embedded VisualizationsA floating observer is a free floating online view of a cell. Free floating means that it is

placed within its own window, such that it can be resized and freely placed anywhere

on the display. The online property refers to the immediate update of the observer

whenever the contents of the observed cell changes.

Floating observers enable an alternative to the visualization sheet approach. Rather

than drive the visualization through a visualization sheet, the visualization system is


implemented through novel cell types. This enables specific purpose visualization

techniques, such as line graphs or pie charts, to be defined using the same mechanism

developed for information uncertainty cell types. The display canvas of the visualiza-

tion system is the cell area, thus embedding visualizations within the spreadsheet. This

is similar to Information Visualization Spreadsheets [46]. Users can configure a float-

ing observer so that the visualization continues to be visible after they navigate away

from the sheet that holds the visualization cell.

Figure 8.2 shows a floating observer that is connected to a cell containing an uncer-

tainty line graph. The uncertainty line graph celltype uses the UUM. The uncertainty

space for each variable in the data set is sampled and shaded polygons are connected

between variables. The data set is specified by the fourth parameter as a range of cells.

In this case, the data set consists of intervals, producing an opaquely shaded poly-

gon. As an added feature, the uncertainty line graph celltype detects when the cell is

too small for the labels to fit and hides them. This enables many such graphs to be

readable when packaged into small areas, allowing “small multiples” [118] style tiled

visualization.

Figure 8.2: Floating Observer, Observing the Uncertainty Line Graph in Cell D14

The floating observer is a non-addressable cell that exists within the dependency


tree, although it must be a leaf node. The dependency tree is used to ensure that it

is updated whenever the cell contents changes. Figure 8.3 shows the dependency tree

for the floating observer of Figure 8.2: the observer is external to the spreadsheet data

structure, but is dependent on the cell in the spreadsheet.

"Salary & Tax" Spreadsheet

Floating Observer

D14

D4

D5

D10

…

Figure 8.3: Dependency Tree for the Floating Observer from Figure 8.2

Commercial spreadsheet systems (e.g. Microsoft Excel [26]) currently use graph-

ical user interfaces for managing visualizations. We use a special edit mode that is

provided by the visualization celltypes to present equivalent graphical dialogs for their

configuration. Commercial spreadsheets also present wizard interfaces for inserting

new visualizations and the equivalent can be achieved here using user aids.

The advantage of the floating observer and embedded visualization approach is that

it offers a visualization interface that is more consistent with existing commercial ap-

plications. However, unlike popular commercial offerings, the visualization is still held

within a cell and is therefore addressable. When several embedded visualizations are

placed in cells next to one another, they form a tiled visualization. The disadvantage

of the embedded visualization approach is that it is not as flexible as the visualization

sheet. Flexibility can be an important requirement for information uncertainty visu-

alization. As the information uncertainty visualization techniques mature, we would

expect that popular uncertainty visualization methods will be distilled into celltypes

for use in embedded visualization.

8.2.3 CustomizationDifferent users have different requirements for their uncertainty modeling and visual-

ization. For example, in some large datasets it may be desireable to simplify uncer-

tainty propagation at the expense of precision, while some mathematical models may

require more precise measures. In addition, some application domains may require

different treatment of uncertainty to others. For these reasons it is important that the

system can be customized to meet the needs of users.

Our system can be customized in three ways. Firstly, the uncertainty modeling data

types that are available can be customized. New information uncertainty models can


be added and existing types can be excluded. Secondly, the propagation models can

be changed. This enables users to implement or select propagation models that are

appropriate to their task. Thirdly, different uncertainty abstraction model handlers can

be chosen. For example, one handler might treat intervals as a uniform distribution

when viewed from a probabilistic point of view while another might use a normal

distribution.

8.2.3.1 Configuring the Uncertainty Data Types

Some users and application domains may wish to use a particular set of data types and

exclude others. It is therefore desireable to enable users to alter the list of active cell

types that the running system maintains.

Figure 8.4 shows the dialog that enables users to edit the list of cell types directly.

New cell types can be added to the list by loading the appropriate plugin using the

button provided. Cell types are given an opportunity to handle the string contents of a

cell based on their order in the list and the top list item has the highest priority. Buttons

are provided to rearrange the order of the list.

Figure 8.4: CellType List Editor

Any alteration to the list of active cell types requires that the system adapt the cur-

rent spreadsheet contents to reflect this change. For example, if a spreadsheet currently

contains an interval “10 +- 5” and the IntervalCell type becomes disabled, then that cell

needs to be converted into another cell type. This should be done automatically by the

software.

A trivial implementation is to re-evaluate all strings in the spreadsheet. However,

the cells that need to be re-evaluated can be determined from the changes that are made

to the list. If a cell type was disabled, then all cells of that type must be re-evaluated.

If a cell type was enabled, then all cells whose type is lower in the list must be re-

evaluated. If there was a change in the order of cells, then only those cells whose type

is between the top- and bottom-most change need to be re-evaluated.


8.2.3.2 Configuring the Propagation Model

The propagation model defines a set of functions that handle operations between val-

ues. The formula language component parses formulae and invokes the appropriate

propagation method according to the propagation model. The invoked method is re-

sponsible for combining the input parameters and returning a result.

There are multiple alternative methods for combining uncertainty information. There-

fore, the user requires the flexibility to control which methods are active from a se-

lection of alternatives. The methods that can be chosen are made available through

plugins. This allows new, esoteric, and domain specific functions to be developed for

the system.

There are two mechanisms that enable users to select propagation methods. The

first allows users to individually manage the method mappings in the propagation

model. A dialog that facilitates this mode of editing in our prototype is illustrated

in Figure 8.5. However, the number of propagation methods tends to be large and this

technique can become cumbersome. The second mechanism uses a layer of abstrac-

tion to group similar propagation methods into a propagation model and group several

propagation models into the currently active propagation model set. The propagation

model set is an ordered set of propagation models, where the order governs the priority

of the propagation model. The higher positioned propagation models override their

lower positioned siblings.

Figure 8.5: Propagation Model Editor

Internally, a propagation model set provides the same signature to method mapping

as a propagation model and fulfills the same contractual obligations from a program-

ming point of view. The UML diagram shown in Figure 8.6 illustrates this relationship,

where the IPropagationModel contract is fulfilled by the PropagationModelSet class.

There is one globally active propagation model set, which holds the currently active

propagation models. The Plug-ins register propagation models with the kernel during

initialization. Users can edit the globally active propagation model set using a dialog as

shown in Figure 8.7. This allows the user to enable, disable, and reorder the available


Figure 8.6: Propagation Model, Method, and Model Set

8.3 Extensibility 139

propagation models. The “Edit” button edits the currently highlighted propagation

model using the Propagation Model Editor shown in Figure 8.5.

Figure 8.7: Propagation Model Set Editor

An advantage of the propagation model set approach is that the user can rapidly

switch between different groups of propagation methods. Users can also disable an

entire propagation model, rather than each method individually.

8.2.3.3 Configuring the Abstract Uncertainty Model Handlers

Abstract uncertainty model handlers implement the interface between the visual map-

ping system and the information uncertainty modeling techniques. These handlers

perform implicit conversion of uncertainty information to fulfill their contract. For

example, when using DUM, a handler will be responsible for providing a possibilis-

tic view of a Gaussian distribution. To accomodate variations in user and application

needs, these handlers need to be changeable.

Figure 8.8 shows a dialog that allows users to change the DUM handler for each

cell type. The left-hand column lists all of the cell types that are currently loaded by

the spreadsheet system. Users can select the currently active DUM handlers in the

right-hand column, but only those cell types that implement the DUM interface are

editable. The area on the right provides a description of the currently selected DUM

handler.

The advantage of selecting abstract uncertainty model handlers is that users can

choose the handler that is appropriate to their task. The handlers are changed at runtime

with immediate effects, enabling users to compare the difference in real-time.

8.3 ExtensibilityThere already exists many ways to represent information uncertainty. However, as

frameworks such as Klir’s generalized information theory mature, more modeling tech-

niques will continue to be appear. For this reason the requirement for extensibility was


Figure 8.8: Dual Uncertainty Model Selector

anticipated in our design. This section examines the extensibility capabilities, first

from the point of view of information uncertainty, and then from a general purpose

point of view.

The design of our system is based around a plug-in architecture. There are six ways

in which a plug-in can extend the functionality of the system:

1. New cell types can be added. These are responsible for parsing strings and hold-

ing the contents of cells. Adding CellTypes increases the types of data that can

be stored in a cell.

2. New functions can be added to the formula language. This enables the plug-ins

to define purpose specific functions, such as statistical methods.

3. New propagation methods and models can be added to provide for different

mathematical rule sets.

4. New uncertainty abstraction model handlers can be added, offering users greater

choice of abstraction methods.

5. New visual elements can be added. Visual elements are displayed on the visual-

ization canvas and are used in the visualization sheet.

6. New user aids can be created, which are packages of code that are executed

from the menu. These are the most free-form additions and their uses range from

offering macro like behavior to the implementation of complete sub-modules.

The extensibility capabilities are exposed to the plug-in at load time through the IKer-

nelRegistration interface. The registration functions are called by the plug-in to notify

the system of the new features it supports. Some features can be registered in a disabled

state, which users can manually enable using the customization dialogs. The scope of

8.3 Extensibility 141

our prototype was limited to only one uncertainty abstraction model (the DUM). How-

ever, this is not a fixed limitation and other implementations may support as many

abstaction models as are appropriate.

We now consider how support for a new uncertainty modeling technique can be

added to system. The following steps would be taken by the plug-in:

1. Register a new CellType, which is responsible for storing the uncertainty details

in the spreadsheet data model.

2. Add a propagation model to facilitate correct propagation of the uncertainty de-

tails. The propagation model contains a propagation method for each of the op-

erators (e.g. addition, subtraction, etc.). The propagation methods take objects

of the new CellType as parameters and return a new cell as the result.

3. Register appropriate uncertainty abstraction model handlers for the new Cell-

Type.

Users can now enter data using the new uncertainty model. The new cell type will parse

the string and store the information. Any formulae involving these cells will use the

new propagation methods. Visualizations that are built will extract their information

through the newly added uncertainty abstraction model handler. Furthermore, existing

formulae and visualizations will also be compatible, enabling users to convert existing

variables to the new cell type.

Separate to uncertainty modeling techniques, new visual elements can also be reg-

istered. Users access visual elements through the visualization sheet by mapping infor-

mation to them. The ability to add new visual elements increases the sophistication of

the visualization system. For example, it is possible to create a suite of visual elements

that interface to Visualization ToolKit2 objects.

Embedded visualization types are added to the system as new CellTypes. For ex-

ample, a ScatterPlotCell can implement the scatter plot [40] visualization technique.

An example string format for this cell is “ScatterPlot(cellrange)”, where the cell range

defines the cells that are included in the plot. For embedded visualizations there is no

need to add any corresponding propagation or abstraction models.

The methods for extension described so far are not limited to dealing with infor-

mation uncertainty. Only uncertainty abstraction model handlers are specific to in-

formation uncertainty. New cell types can conceivably manage any type of data for

which propagation methods can be used to offer convenient operations. Functions in

the formula language can also be general purpose.

2The Visualization ToolKit (VTK) is a freely available visualization system (see [105]).


8.4 Case Study: GPGPUSheetThis chapter has discussed several advanced features, which are explored in the case

study of the next chapter. The case study presented here investigates the extensibil-

ity of our system to more general purpose problems. A field of research that has

recently developed is General Purpose Graphics Processing Unit (GPGPU) program-

ming. GPGPU exploits the advances in processing power available from the parallel

stream processing architecture found in graphics processors. Examples of use range

from image processing through to fluid flow simulation. Further information is avail-

able from the GPGPU website3, including tutorial sessions that have run at ACM SIG-

GRAPH 2004, IEEE Visualization 2004, and Supercomputing 2006.

Here we present GPGPUSheet, which is a prototype system for creating and visu-

alizing GPGPU applications. GPGPU uses graphics card textures for memory storage

and fragment shader programs to apply convolution kernels. For this purpose we in-

troduce two new cell types: the TextureCell and the KernelCell (see Table 8.3). The

TextureCell wraps an image that can be loaded onto the graphics card as a texture. The

KernelCell is a graphics card fragment shader program, in human-readable code form.

Name Description

TextureCell 1D, 2D, or 3D image data with 1 (Luminance), 3(RGB), or 4 (RGBA) components

KernelCell Fragment shader program

Table 8.3: Novel Cell Types for GPGPU

The spreadsheet approach to GPGPU facilitates comparison tasks, such as effect

choice for parameters or inspection of intermediate states during repeated convolu-

tions. For example, the spreadsheet shown in Figure 8.9 compares results of a GPU

based edge finding algorithm under different parameters. The parameters “threshold”

and “mode” are stored in rows 3 and 4. Row 5 contains the results from applying the

kernel in cell B2 to the texture in cell B1 for each parameter. The convolution kernel

in cell B2 is displayed in the “Cg Code Editor” window.

A convolution kernel is executed by using a function call RunKernel in a formula.

The output is itself a TextureCell, thus allowing the output from one kernel to be the

input for another. The new functions that GPGPUSheet introduces are listed in Ta-

ble 8.4. The parameters of the RunKernel function consist of a comma separated list

of program parameters. The Image functions construct a texture from the numeric

values contained in a cell range. The Pixel functions extract a numeric value from a

particular pixel within a texture. PadTex creates a texture of a particular dimension

and fills it with a cell. If that cell happens to be a texture, then it is tiled. ResampleTex

3The GPGPU website, http://www.gpgpu.org/

8.4 Case Study: GPGPUSheet 143

creates a scaled copy of another texture.

Function Example of Use

RunKernel RunKernel(B1, “threshold”=B3, “mode”=B4)ImageL,ImageRGB,ImageRGBA

ImageRGB(A1:F7)

PixelR,PixelG,PixelB,PixelA

PixelR(B2, 10, 10)

PadTex PadTex(B2, 512, 512)ResampleTex ResampleTex(B2, 256, 256)

Table 8.4: Novel Functions for GPGPU

Textures are arrays of pixel data. The arrays can be one-, two-, and three-dimensional

and each pixel can contain either one (luminance), three (RGB), or four (RGBA) com-

ponents. The components can either be integer or floating point values. In some in-

stances computer graphics hardware can be limited to texture sizes that are a power

of two and the user will need to either pad or resample their data. The PadTex and

ResampleTex functions can be used to overcome this limitation. Textures can either

be built from ranges of cells in the spreadsheet, or loaded from an external file through

the menu option.

The KernelCell implements the ISpecialEditCell interface, allowing it to present

a complete syntax highlighting editor (as shown in Figure 8.9). A menu option to

insert base code for well known kernels was added to provide a spring off point for

developing new convolution kernels.

In conclusion, this case study shows the extensibility of our spreadsheet-based de-

sign to other problems. GPGPUSheet is valuable tool for assisting research into GPU

based algorithms, such as our work in [14]. It enables the user to visually explore the

effects of an algorithm, to concatenate complex sequences of kernels, and to evaluate

changes to shader programs in real-time. Shader programs and textures were added as

novel cell types, and the formula language was extended to enable GPU bound opera-

tions to be carried out.


Figure 8.9: Prototype system for GPGPU visualization

CHAPTER 9

Evaluation

9.1 IntroductionThis chapter provides an analysis of the performance of the system and compares it

to common alternatives. When a user wishes to model information uncertainty there

are two broad methods: the first is to use a numerical approach, which simulates the

uncertainty; the second is to use an analytical approach, where the parameters of the

uncertainty space are managed using mathematical principles. The objective for either

method is for the user to gain an understanding of the uncertainty space affecting their

model.

Numerical approaches typically come in three flavors: manual perturbations of

input variables, such as “what-if” exploration; automated regular perturbations, such

as animated stepping of variables; and random perturbations of input variables using

Monte-Carlo simulation. The results from manual perturbations are typically observed

by the user to mentally map the uncertainty space. Advancements in computing power

has made automated perturbation viable and this technique is now in common use

within certain sectors, including the financial markets sector.

Analytical techniques are available using several software packages. Commercial-

grade spreadsheet systems, such as Microsoft Excel and OpenOffice.org calc, include

statistical operators that enable modeling using normal distributions. Mathematical

packages, such as MatLab, tend to offer a selection of functions for dealing with a

wider range of distributions, but their use is essentially the same. The parameters

needed to model a normal distribution are the mean μ the standard deviation σ . These

parameters are then passed to statistical functions for interpretation (e.g. see Table 9.1,

146 Chapter 9. Evaluation

drawn from [78]).

Function Description

STDEV(range_of_values) Calculates the standard deviation σ from a range ofvalues

NORMDIST(x, μ , σ ,FALSE)

Calculates the probability of a value ≤ x will occur

NORMDIST(x, μ , σ , TRUE) Calculates the probability that x will occurNORMINV(p, μ , σ ) Returns the value with probability p. This is the

inverse of NORMDIST

Table 9.1: A Selection of Normal Probability Functions in Microsoft Excel 2003

The system designed in the course of this thesis takes an analytical approach to

information uncertainty and is most similar to using a spreadsheet system. In Sec-

tion 4.6 we examined the advantages of our approach over that of using a traditional

spreadsheet. In Section 6.4 we investigated how user-objectives can be used to drive

visualization with a case study in financial decision support. In this chapter we eval-

uate the complete system in three ways: firstly, we perform a quantitative analysis,

comparing our system to traditional numerical and analytical methods; secondly, we

describe user feedback that was gained from two surveys; and thirdly, we investigate

advantages of using our system in a business planning case study.

9.2 Quantitative AnalysisThis section examines the performance issues and the potential for user errors. Re-

search by Brown and Gould [9] has found that errors are common in spreadsheets,

despite a high level of confidence by their creator. Furthermore, the majority of errors

they observed (65%) were due to mistakes in formulae. Reinhardt and Pillay [100]

analyzed errors made by computer literacy students in spreadsheet formulae and found

that 82% made errors in formulae that involved cell adressing. This percentage rose

to 93% for formulae involving financial functions. These observations suggest that

reducing the number and complexity of formulae should form a pricipal strategy in

substatially reducing the number of errors made.

Another approach to reducing spreadsheet errors is to work in groups. However,

Panko and Halverson [88] found that half of the spreadsheets generated by groups of

four still contained errors. This is an improvement over spreadsheets devloped by in-

dividuals (81% down to 50%), but contrary to expectation, group collaboration did not

eliminate the “oversight errors” [88, pp. 6]. Furthermore, some errors were contentious

and although one member of the group recognized the problem, they were unable to

convince the other members of the group. This suggests that, practicality aside, soley

increasing the number of collaborators will not continue to improve the error rate.

9.2 Quantitative Analysis 147

In this section we measure the number of operations that the user is required to

perform under several common conditions. The operations are categorized by type

and summarized. The materials used in these experiments are:

• 1 PC

– OS: Microsoft Windows XP SP2 (32bit)

– Java Runtime: Sun Microsystems JRE 1.6.0_01

– CPU: AMD Athlon 64 X2 Dual Core 2Ghz

– RAM: 1GB

– Display: 1680x1050, 32bit color

• OpenOffice.org Calc 2.2.1, a freely available commercial grade spreadsheet

• IvySheet as at March 2007, our prototype system

We consider two modes of address. The first, construction, is where the spreadsheet is

constructed with an anticipation of the information uncertainty modeling requirements.

The second, retrospection, is where the spreadsheet is already built, but at least one

variable needs to be changed to use a different modeling technique.

There are a number of actions that the user can perform, which are categorized in

Table 9.2. Block pasting is an operation where a smaller copy region is pasted into

a larger area. This is typically used to replicate formulae from a template, such as

from a prototype cell to the remainder of a column. Modern spreadsheet software

automatically adjusts cell references in pasted formulae. We count a block paste as a

single copy+paste operation.

Label Description

α Enter data into a cellβ Modify data in a cellγ Change layout (e.g. insert a column)δ Enter a formulaε Modify a formulaζ Copy+paste operationη Delete (clear) a cell

Table 9.2: Actions of the User

The data entry (α), related updates (β ), and cell removal (η) operations are least

likely to be the source for errors. This is because there are a limited number of things

that can go wrong: either the data is incorrect or the wrong cell was chosen. These three

operations also provide direct visual feedback, where the cell contents directly reflects


the new input. Layout changes (γ) are more likely to cause errors, albeit indirectly.

Not only can the layout change be wrong, but it may also contribute to subsequent

mistakes made due to the changed frame of reference. Formulae (δ and ε) provide

the most opportunity for error. They offer indirect feedback, where the formula is

revealed only on request. This makes it harder to recognize a problem. Similarly,

copy+paste operations (ζ ) will update any copied formulae automatically. Incorrect

formula updates may be hard to spot and often require manual formula inspection.

We do not weigh formula complexity in this evaluation. Our system has compara-

tively simpler formulae than the alternatives that we tested. This is due to automated

propagation. For example, the multiplication of two normal distributions using tradi-

tional means requires the formulae:

var(XY ) = var(X)∗ var(Y)+ X ∗ X ∗ var(Y )+ Y ∗ Y ∗ var(X)

μ(XY ) = μ(X)∗μ(Y )

Which is a total of six multiplications and two additions over two formulae. The

same operation is effected by

v′ = v1 ∗ v2

using our system, which is a single multiply in a single formula. While excluding for-

mula complexity biases results against our system, there are two reasons for doing this.

Firstly, it is possible that prudent users are able to define macros for certain commonly

used formulae. This would make the arithmetic complexity measure meaningless. Sec-

ondly, it is easy to introduce bias in the opposite direction. The formulae are mostly

of the same form and therefore it is possible that users will introduce fewer errors than

the complexity might otherwise imply.

9.2.1 Construction ExperimentsFor the initial experiment a spreadsheet with three inputs is used. The first variable is a

salary increase, the second is the tax rate, and the third is the initial salary. The output

is the net income, calculated over a number of years: netincomei = salaryi−taxi where

salaryi = salaryi−1 ∗ rategrowth and taxi = salaryi ∗ ratetax.

Figure 9.1 shows the spreadsheet without uncertainty information, henceforth re-

ferred to as the reference spreadsheet. In this instance, rategrowth is 7%, ratetax is 17%,

and salary1 is 60000. The construction cost is as follows:

• Text labels consume 6α .


Figure 9.1: Spreadsheet for First Experiment, Without Uncertainty

• Year labels consume 1α +1δ +1ζ , since they are regular.

• Initial variables require 3α .

• Formula for tax and netincome, for year 1 take 2δ .

• Formula for salary in year 2 takes 1δ .

• formulae for salary in years 3-7 takes 1ζ .

• formulae for tax and net income in years 2-7 takes 1ζ .

Total cost is 10α +4δ +3ζ .

9.2.1.1 Test I

Test I-A The first test is to use a normal distribution to model the salary growth

rate: μ = 0.07, and σ = 0.01. Figure 9.2 shows the spreadsheet constructed using our

system. The construction cost is the same as without uncertainty, 10α +4δ +3ζ . The

user simply enters “[email protected]” in the salary growth field.

Figure 9.2: Spreadsheet for First Experiment, With Uncertainty


Test I-B Using a traditional analytical approach is shown in Figure 9.3. Two columns

are needed for every variable that depends on the salary growth, one to hold the mean

and another to hold the standard deviation. The construction cost is as follows:

Figure 9.3: Spreadsheet for First Experiment, Analytical Approach Using TraditionalMethods

• Text labels consume 10α .

• Year labels consume 1α +1δ +1ζ , since they are regular.

• Initial variables require 5α , including the initial standard deviation of 0 for salary

in year 1.

• Formula for tax and netincome, for year 1 take 4δ , although the formulae are

more complicated.

• Formula for salary in year 2 takes 2δ .

• formulae for salary in years 3-7 takes 1ζ .

• formulae for tax and net income in years 2-7 takes 1ζ .

Total cost is 16α +7δ +3ζ , an increase of 6α +3δ .

Test I-C The Monte-Carlo method [3] is a numerical approach and takes a different

form. A prototype row is built that contains the entire model in a single row. The salary

rate will now be generated from a normal distribution, using the pseudo-random num-

ber generator. This is achieved using the following formula: “=NORMINV(RAND();μ;σ )”.

The prototype row is then duplicated to rows below an arbitrary number of times,

where each row represents a trial run in the simulation. Finally, the mean and standard

deviation for each of the output variables is calculated. Figure 9.4 shows part of the

spreadsheet. The mean is calculated using the formula “=AVERAGE(Ω1:Ωn)” and

standard deviation using “=STDEV(Ω2:Ω(n− 1))”, where Ω represents the column

and n represents the number of trial runs. The cost is as follows:


Figure 9.4: Monte-Carlo Spreadsheet for First Experiment

• Text labels for all columns (23α) 3+2+6*3

• First are three columns for the input variables, tax rate, salary rate, and initial

salary (2α +1δ ).

• Following this are the calculated columns for tax and net income for year 1 (2δ ).

• Following this are calculated columns for salary, tax, and net income for years 2

through 7 (1δ +2ζ ).

• The prototype row is copied to rows below (1ζ ).

• Calculate mean and standard deviation for salary, tax, and net income and repeat

for all 7 years (6δ +1ζ )

Total cost is 25α +10δ +4ζ , an increase of 15α +6δ +1ζ over our system.

9.2.1.2 Test II

For the second experiment, we consider the case where all three input variables are

uncertain: tax rate, salary rate, and initial salary. Tax rate will now be modeled using

a normal distribution with μ = 17 and σ = 3, the salary rate remains at μ = 0.07 and

σ = 0.01, and initial salary will now have a normal distribution of μ = 60000 and

σ = 1000.

Test II-A Using our system the construction cost is the same, as the user simply

enters “17@3” for the tax rate field and “60000@1000” for the initial salary. Total

cost: 10α +4δ +3ζ .

Test II-B Using an analytical approach in a traditional spreadsheet requires two fields

where previously there was one: one to hold the mean and one for standard deviation

for tax rate. This adds 2α , one for the text label and one for the additional field. The

standard deviation fields are already required to store the additional information in-

troduced by uncertainty in salary. However, the tax is now more complex as it must

combine the uncertain salary with the uncertain tax rate. In this case it is the multi-

plication of two normal distributions, the formula for which was described previously.


The remaining cost calculation is the same as in Test I-B. Total cost: 18α +7δ +3ζ ,

an increase of 8α +3δ over the reference sheet.

Test II-C Expanding the Monte-Carlo spreadsheet consists of using the pseudo-

random number generation for three fields. Thus the cost is 3δ for the first three

columns. Total cost: 23α +12δ +4ζ , an increase of 13α +8δ +1ζ over the reference

sheet.

9.2.2 Retrospection ExperimentsIn these experiments we evaluate the cost of changing uncertainty modeling techniques

in existing spreadsheets.

9.2.2.1 Test III

Our first test is to retrospectively upgrade the salary rate from a normal quantity to a

normal distribution. For this scenario the change was not anticipated by the user and

therefore the starting point is the reference spreadsheet, as shown in 9.1. Thus, the

resulting spreadsheets from Test III should be the same as those in Test I.

Test III-A Using our system, the user simply modifies the salary growth field. Total

cost: 1β .

Test III-B Using the analytical approach in a traditional spreadsheet requires a change

in layout:

• First, the columns for standard deviation are added (3γ) and labeled (3α).

• The text labels for salary, tax, and net are updated to reflect that they are now

means (3β ).

• Then the text label for salary growth is changed to “salary growth mean” (1β ).

• If the new mean were different to the current value of the salary growth it would

need to be updated.

• The standard deviation text label is added and the standard deviation is entered

(2α).

• Zeros are added for the standard deviation columns in year 1 (3α).

• Formulae are entered for the standard deviation columns in year 2 (3δ )


• The row for year 2 is replicated for years 3-7, which is a short-cut to fill in

missing formulae (1ζ ).

Total cost: 8α +4β +3γ +3δ +1ζ (3+2+3,3+1,3,3„1).

Test III-C Changing to a Monte-Carlo spreadsheet requires significant work as it

completely changes the spreadsheet layout. Typically this involves starting a new

spreadsheet:

• A new sheet is added to the system (1γ)

• Copy+paste the first two column headings, enter the remaining labels (21α +2ζ )

• Copy+paste the tax rate and initial salary (2ζ )

• Enter the formula to generate the salary growth rate from the pseudo-random

number generator (1δ )

• The calculated columns for tax and net income for year 1 are entered (2δ ).

• The calculated columns for salary, tax, and net income are replicated for years 2

through 7 (1δ +2ζ ).

• The prototype row is copied to rows below (1ζ ).

• Calculate mean and standard deviation for salary, tax, and net income and repeat

for all 7 years (6δ +1ζ )

Total cost: 21α + 1γ + 10δ + 8ζ . This cost is virtually equivalent to the cost for

construction, 23α +12δ +4ζ .

9.2.2.2 Test IV

For the next set of experiments we consider the reverse, removing the uncertainty infor-

mation from salary rate. The resulting spreadsheet should be the reference spreadsheet

as shown in 9.1

Test IV-A Using our system, the salary rate is changed back to “7%”. Total cost: 1β .


Test IV-B Using the analytical approach in a traditional spreadsheet again requires a

change in layout:

• First, the columns for standard deviation are removed (3γ).

• The text labels for salary, tax, and net are updated to reflect that they are no

longer means (3β ).

• Then the text label for salary growth is changed to “salary growth” (1β ).

• The standard deviation field for salary growth is deleted (1η). (the text label was

removed when the column it was in was deleted)

Total cost: 4β +3γ +1η .

Test IV-C Assuming that the original sheet was kept when the new sheet was added

for the simulation, then the simulation sheet would simply need to be removed. Total

cost: 1γ .

9.2.2.3 Test V

For the next set of tests the spreadsheet already contains the salary growth as a nor-

mal distribution and the tax rate is promoted to a normal distribution. The starting

conditions for Test V are the spreadsheets constructed in Test I.

Test V-A Using our system the tax rate field is changed. Total cost: 1β .

Test V-B Using the analytical approach in a traditional spreadsheet requires some

changes:

• The text label for the tax rate is changed to reflect that it is now a mean (1β )

• A standard deviation field is added and labeled for the tax rate (2α)

• The standard deviation calculation for tax in year 2 is updated (1ε)

• The standard deviation calculation is propagated to the other rows (1ζ )

Total cost: 2α +1β +1ε +1ζ .

Test V-C The field for tax rate is calculated using the pseudo-random number gener-

ator (1δ ) and this change is propagated to the other rows using copy+paste. Total cost:

1δ +1ζ .


9.2.3 DiscussionFigure 9.5 shows the construction costs graphically. The main type of operation carried

out during construction was data entry, α , followed by formula construction, δ . The

construction cost of our system is eqivalent to that of the reference spreadsheet. There

were only five more keystrokes in Test I-A and seven more keystrokes in Test II-A

compared to the construction of the reference spreadsheet. Significantly, the spread-

sheet layout and all formulae are identical to the reference spreadsheet.

Figure 9.5: Construction Cost Graph

Both analytical probability and monte-carlo spreadsheets required a change in lay-

out and an increase in formula counts. Figure 9.6 compares the number of formulae

required. Although there are more formulae in the monte-carlo spreadsheets, they are

more consistent than their counterparts in the analytical spreadsheet. Combinations

of multiple uncertain variables quickly results in complicated formulae in analytical

sheets.

Figure 9.6: Number of Formulae in Construction Experiments


Table 9.3 lists the operations required for the retrospection experiments. In all

three experiments our system required only a single data field to be changed. Both

of the other methods required changes to either fomulae or layout (see Figure 9.7),

irrespective of whether uncertainty information was being added (Tests 3 and 5) or

removed (Test 4).

α β γ δ ε ζ η3-A 13-B 8 4 3 3 13-C 21 1 10 84-A 14-B 4 3 14-C 15-A 15-B 2 1 1 15-C 1 1

Table 9.3: Retrospection Cost

Figure 9.7: Formula and Layout Changes During Retrospection Experiments

The results from this section are signficant for three reasons. Firstly, fewer op-

erations are needed when using our system, which means faster construction times

and fewer chances for accidental errors. Secondly, our system requires no formulae

or layout changes. Prior research has shown that formulae are the principal source of

spreadsheet errors. However, layout changes are also significant because they require

the user to alter their mental map of the spreadsheet. This potentially leads to com-

prehention issues and increases the likelihood of incorrect cell references. Thirdly, it

made little difference in our system whether information uncertainty had been antici-

pated or not. In contrast, traditional spreadsheet methods had significant formula and

layout costs for introducing and removing information uncertainty.

9.3 Sensitivity Analysis Surveys 157

9.3 Sensitivity Analysis SurveysTwo studies were conducted using a sensitivity analysis scenario. The first sought sub-

jective feedback from participants drawn from a variety of backgrounds. The second

study was designed to gain a more detailed evaluation from financial experts.

9.3.1 First SurveyIn the first survey, a sensitivity analysis scenario was sent to participants, whose back-

grounds are listed in Table 9.4. The survey is attached in Appendix A.

3 financial analysis experts3 frequent spreadsheet users1 mathematics expert4 skilled computer users (who did not frequently use spreadsheets)

Table 9.4: Respondents to the Survey

The examples were produced using IvySheet as at Feb 2007. The graphs are un-

certainty line graphs that use the UUM, as described in Section 6.3.3. They were em-

bedded within the spreadsheet and displayed using floating observers. The questions

and selected repsonses follow.

1. Did the interval approach make sense to you? This question was answered

positively by all respondents. However, one of the finance experts commented that

intervals can be achieved

“using a sensitivity matrix ie variables across one axis and results across

the other. Often times this can be more useful because you can see how

sensitive the result is to the change in variable.”

They later noted that a probability distribution would address this situation. Addition-

ally, they went on to say that

“I can see that presenting in this new way is powerful when it comes to

planning because you can show the range of probable outcomes as op-

posed to a whole series of data points.”

which indicates that the polygonal visualization was useful. One of the financial ex-

perts described the system as “very intuitive”.


2. Would you consider this a more useful tool than a standard spreadsheet? This

question was answered positively with one exception: one of the financial expert re-

spondents was undecided. They described their position as,

“Not sure that I’d say its more useful than a standard spreadsheet but it

presents the data very clearly which could be useful for presentations.”

The other financial expert commented that

“This tool will save much time ... It requires a single edit instead of itera-

tive input changes and structure changes to store the results. Even MatLab

requires more effort when investigating variables that are uncertain.”

3. Would you like to use a tool like this? This question was answered positively

with one exception: the infrequent spreadsheet user indicated that they would not use

this system. Their reason was:

“Not a visualizing person.”

The mathematics expert answered positively and gave the following reason:

“Because it gives an immediate visual feedback on the uncertainty.”

4. What would you change? Many of the respondents shared the view that prob-

ability distributions would be particularly useful. Notably, all financial experts made

this clear, for example:

“I would like to apply probability distributions & see the results.”

The mathematics expert expressed a desire to

“specify a probability distribution over the interval (or the real line) and

then visualize this ... by colour/shading intensity. [T]here could be a stan-

dard set of distributions to choose from e.g. Gaussian, binomial, poisson,

etc.”

Two of the frequent spreadsheet users also suggested an interactive brushing on the

graph, so as to determine which variances contributed to that point.


5. Do you have any other comments about what you have seen? Responses to

this question primarily expanded on answers to question 4 above. One of the skilled

computer users asked the following question:

“Can I manipulate the data graphically (dragging the line/s in the graph)

instead of typing in variances via the spreadsheet?”

This again implies that interactivity in the visualization is worthy of future work. The

mathematics expert warned against using the term variance to describe intervals.

9.3.2 Second SurveyFor the second study, three financial experts were asked to complete tasks using the

software and answer questions on a Likert scale [22]. The particpants were selected

for their familiarity with sensitivity analysis techniques.

The materials used for the experiment were:

• 1 PC

• OS: Microsoft Windows XP SP2 (32bit)

– Java Runtime: Sun Microsystems JRE 1.6.0_01

– CPU: AMD Athlon 64 X2 Dual Core 2Ghz

– RAM: 1GB

– Display: 1680x1050, 32bit color

• IvySheet as at March 2007, our prototype system

• Entrance Questionnaire

• Description of Tasks

• Exit Questionnaire

• Observations Sheet

The method used in the experiment is as follows.

1. Participants were first given an entrance questionaire to determine their back-

ground experience.

2. They were then seated before a new instance of the IvySheet software and asked

to perform three main tasks. During this time they were observed and their

performance was recorded.


3. Upon completion of the tasks, the participants were asked to complete an exit

questionnaire.

The questionnaires, tasks, observations sheet, and results are attached in Appendix B.

Three financial experts volunteered to take part in the study. One of the participants

(RS) had also taken part in the first survey, the other two had not. The entrance ques-

tionnaire asked the participants to rate their experience in four areas on a four-point

scale (1-None, 2-Beginner, 3-Intermediate, 4-Advanced). Figure 9.8 shows the aver-

age rating. Volunteers considered themselves proficient with spreadsheets and finan-

cial techniques, but rated their familiarity with visualization tools between none and

beginner level. There was a reasonable degree of confidence with uncertainty methods.

These results suggest that the participants are well placed to evaluate comprehension

of the uncertainty propagation and would have little trouble using the spreadsheet soft-

ware.

Ave ra g e P ro fi le o f R e s p o n d e n ts

1 .0 0 2 .0 0 3 .0 0 4 .0 0

S p re a d s h e e ts

Vis u a liza tio n

U n ce rta in ty

Fin a n cia ls

Figure 9.8: Background of Respondents

The times taken to conduct each task were measured to the nearest minute. This

includes the time taken to read the instructions. The timer was stopped when a par-

ticipant signalled that they had completed the step and were confident of their results.

Figure 9.9 shows the average time to complete each step. The first step involves build-

ing a spreadsheet for the scenario. It was the most time consuming step and participants

spent approximately one third of their time checking their work. The second step was

to create a graph and this was quickest of the steps. The only significant amount of

time was spent by one participant, who then requested assistance. This was because

they were unsure whether to label the axes. The majority of time in the third step was

spent by participants checking their results.

Participants were asked to ensure that their work was correct before signalling that

they had completed a step. Correctness was then verified by the observer, who recorded


Ave ra g e Tim e to C o m p le te Ta s ks (m in u te s )

0 1 2 3 4 5 6 7 8 9 1 0

S te p 3

S te p 2

S te p 1

Figure 9.9: Average Completion Time

the number of errors made. Any errors were corrected by the observer before proceed-

ing to the next step. During the course of performing a step the participants would spot

and correct errors in their own work. These events were recorded as corrections and

not as errors. Only a single uncorrected error was found, which was due to an oversight

error while composing a formula in the first step. This type of oversight error is com-

mon in spreadsheet tasks [88, 87] and was not related to uncertainty comprehension.

The lack of errors, especially in the second and third steps, indicate that participants

were comfortable and proficient with the software and tasks.

Upon completion, the participants were asked to complete an exit questionnaire

that used a five-point Likert scale (1-Strongly Disagree, 2-Disagree, 3-Neither Agree

nor Disagree, 4-Agree, 5-Strongly Agree). Figure 9.10 shows the questions and av-

erage responses. There was general congurence in the responses, with an average

standard deviation of only 0.431.

These results show that the participants found the encapsulation of uncertainty de-

tails to be intuitive and aided comprehension (Q1-Q4). They did not have issues with

the additional information (Q4, Q7, Q8). The real-time graph was marginally useful,

but would likely find greater use in more complex scenarios (Q5, Q6). The participants

wanted to do more uncertainty modeling than they currently are, and they expressed

a desire to use a tool like IvySheet (Q9-Q13). In the comments section, one of the

participants noted that they primarily use Monte-Carlo techniques for dealing with un-

certainty.

1Detailed results are listed in Appendix B


1. Overall, I found the system to be intuitive2. Placing an interval in a cell made sense to me3. I would need more training to complete the tasks4. I found it easy to follow the effect that an interval had on the rest of the spreadsheet5. The graph helped me to understand the effects better6. I would definitely use the graph if there were many numbers7. The spreadsheet became too cluttered or hard to read8. This tool makes it easier to make mistakes9. I would use a tool like this if it were available to me10. I would use intervals in Microsoft Excel if they were supported11. My current tools are adequate for modeling and visualizing uncertainty12. I would use normal probability distributions more often in my work if I had a tool like this13. I would model uncertainty more often if I had these features available

Average Responses

1.00

2.00

3.00

4.00

5.00

1 2 3 4 5 6 7 8 9 10 11 12 13

Figure 9.10: Questions and Average Responses

9.4 Case Study: Business PlanningVirtually every commercial enterprise will engage in business planning. Business plans

are used to explore future directions, align strategic objectives, and support applica-

tions for financing. Good business plans are not only intricate, but also need to clearly

communicate their contents. The nature of a business plan is to explore the unknowns

of the future and how these will be dealt with [7].

This case study explores the application of our information uncertainty modeling

and visualization framework to business planning. Business plans can be divided into

three main parts: the strategic plan, which includes goals and management; the mar-

keting plan, which includes market research and analysis; and the operational plan,

which includes operations and finance.

Information uncertainty is most evident in the Marketing plan. The Marketing plan

involves predicting market trends, potential sales, and similarly uncertain information.

These uncertainties flow on to the Operational and Strategic plans, which specifically

outline how these challenges will be met. The Operational plan explores survivability,

profitability, exit points, and financing needs based on differing market conditions. The

Strategic plan incorporates managements issues, such as staffing and resources, which

can include information uncertainty.

In this case study we consider the business plan for XYZ Software2, a newly

formed spin-off company from an existing publishing company, ABC Publishing. ABC

2These names are fictional

9.4 Case Study: Business Planning 163

Publishing has a long history of publishing scientific books and journals, and therefore

they have access to market information for print-based publications. However, XYZ

Software plans to work in a new space: the digital publication field, which is growing

rapidly and for which the data is not yet stable. The business plan is being prepared to

raise funding for the new venture, which will be run as an entirely separate company

to ABC Publishing.

A spreadsheet was built to provide the supporting information for the business plan.

We now analyze the advantages of our system for this purpose.

Hierarchical Structure Business plans tend to include large spreadsheets with many

calculations. To alleviate complexity, we use hierarchical spreadsheets to build a struc-

ture that groups information into useful categories. Figure 9.11 shows the hierarchical

structure for XYZ’s business plan. This makes access to relevant information more in-

tuitive than selecting the sheet from a linear list. Furthermore, the sheet names tend to

be longer when arranged in a linear list as they do not benefit from the implied labeling

given by hierarchical composition. For example, the Year 1 sheet under Balance Sheet

would otherwise likely be given the name “Balance Sheet Y1”.

Figure 9.11: Hierarchical Business Plan Spreadsheet

Incremental Refinement The spreadsheet can be built top-down, since details could

be left to subsheets. For example, in Figure 9.12, the cell in B1 was originally a high


level estimated value. As the work on the business plan progressed, this cell was

replaced by an embedded spreadsheet, the contents of which is shown in Figure 9.13.

Figure 9.12: Market Share Overview with Embedded Sheets

The uncertainty details can similarly be refined as information becomes available.

The predicted values for market share can initially be modeled using singularly-valued

estimates, then promoted to normal distributions as more detailed predictions are made.

Compensate for Missing Details The first step in the marketing plan is to establish

the target market. A survey of ABC Publishing customers who would use the new

XYZ Software service was broken down by age. Complete information on their classi-

fication into customer types was unavailable and consequently this was estimated using

intervals. These intervals compensate for the fact that the information was not known.

The target market is shown in Figure 9.14.

Summary Observation Sometimes it is convenient to keep an eye on a high level

summary value while building lower level contributions. Floating observers can ob-

serve any cell and therefore a floating observer can be created for the high level sum-

mary value. Traditional spreadsheets support a “split view”, where the window is

divided into viewports. However, this feature extends to the current sheet only and is

of no use when the summary value is on another page. The floating observer is created

in its own window and therefore is unaffected by which spreadsheet is selected.

Adding a New Uncertainty Modeling Technique For certain circumstances it is

necessary that a value cannot dip below zero, such as percentage values. It was still

desirable to model these as normal distributions, but clamped to only positive values.

One solution is to add a new modeling technique, called a clamped normal, for which

any value outside a range is clamped.


Figure 9.13: Market Share for Year 1

Figure 9.14: Target Market


Simplified Propagation A promoter is defined for promoting a normal distribution

to a clamped normal with the range [−∞,+∞]. A propagation method for arithmetic

between clamped normals is added to the propagation model. All other combinations

are handled using hierarchical heterogeneous propagation. Thus, arithmetic between a

quantity and a clamped normal results in the quantity x being promoted to a clamped

normal p = (μ = x, σ = 0, [−∞,+∞]) before the operation is carried out.

Probabilistic Break-even The most critical part of a business is the break-even anal-

ysis. Break-even analyses are carried out before any venture is embarked upon, as

they are an instrumental indicator of the potential success. Using traditional methods,

the break-even analysis is performed under several scenarios, which is a numerical

approach to information uncertainty. The break-even analysis shows the cumulative

profitability over the projected period (e.g. Figure 9.15). The point at which the prof-

itability becomes non-negative is the break-even point, in this case the second quarter

of Year 3. Using our system we have been able to model most variables using normal

distributions. The interactions between these numerous normally distributed indepen-

dent variables culminates in a normally distributed profit prediction. This allows us to

plot the probability of profitability, as shown in Figure 9.16. As can be seen from the

graph, we are likely to become profitable in the third year, but even after 3 years there

is still a chance that the venture will not yet be profitable.

Figure 9.15: Typical Break-even Visualization

In summary, IvySheet provided several benefits over using a traditional spreadsheet

for business planning. The naturally hierarchical structure of business planning could

be transferred to the spreadsheet. Planning could proceed before all details were avail-

able and the details could be added as they became available. Uncertainty information

could be added without significant alterations and where a new technique was required,

an extension for it could be created. Due to the propagation of the uncertainty details,

the spreadsheet provided more information than its traditional counterparts. This en-

ables better informed support tools, such as the probabilistic break-even analysis.


Figure 9.16: Probabilistic Break-even Visualization

CHAPTER 10

Conclusion and Future Work

“It is not certain that everything is uncertain.”

– Blaise Pascal (Pensees, 1670)

10.1 AchievementsIn this thesis we have introduced an integrated information uncertainty modeling and

visualization system, which intrinsically supports information uncertainty modeling,

automated uncertainty propagation, and uncertainty model abstracted visualization.

This significantly lowers the barrier to the uptake of information uncertainty modeling

and visualization.

There are three significant benefits. Firstly, users are able to build their data model

using whichever uncertainty modeling techniques are appropriate. They are no longer

limited by prior modeling choices, enabling them to capture a greater amount informa-

tion, and therefore improving the fidelity of the data model. Secondly, the system is

intrinsically aware of the uncertainty, protecting it against user induced errors. Users

do not require detailed comprehension of the uncertainty mechanics in order to use a

modeling technique, since these are handled automatically. Lastly, visualization tech-

niques have been abstracted from the underlying uncertainty data type, enabling them

to continue to function when the uncertainty modeling technique changes. This enables

users to create visualizations in the face of changing uncertainty.

We extended the spreadsheet paradigm to intrinsically support information uncer-

tainty. We take a two-fold approach to this: encapsulation of uncertainty information

170 Chapter 10. Conclusion and Future Work

and abstraction of uncertainty for visualization.

Encapsulation

In this thesis, we argued for the encapsulation of information and its uncertainty, where

a variable and its uncertainty are treated as a unit. We devised the uncertainty prop-

agation model, which enables automated propagation of information uncertainty. By

using encapsulation, uncertainty details are treated at the sub-variable level and can be

added, changed, or removed at any point in the process. This means that users do not

have to anticipate their uncertainty requirements before building the data model, and

the uncertainty information can be easily revised as more becomes known.

Abstraction

We designed uncertainty abstraction models, which form the interface between model-

ing techniques and visual mappings. By employing an uncertainty abstraction model,

visualization techniques can be developed for abstract plural values, freeing them from

dependence on any particular data type. To guide users in the selection of appropri-

ate visual mappings, we defined user-objectives for uncertainty visualization. User-

objectives provide a data type abstracted means of describing visualizations and we

explored a user interface for selecting a user-objective.

10.2 LimitationsOur integrated information uncertainty modeling and visualization system is designed

for a particular type of uncertainty: that which can in some way described for a partic-

ular unit of information. This is a fundamental limitation, as our approach seamlessly

integrated information with its uncertainty. Any uncertainty that cannot be so defined

is therefore unsuitable for our approach.

10.3 Possible Applications and ExtensionsThe spreadsheet system presented here can be most readily applied in a fields where

spreadsheets and information uncertainty are both already in common usage. For ex-

ample, the financial markets, marketing, and planning fields are natural targets. It

would be worth exploring the development of our approach as an add-on to existing

commercial spreadsheet software, as this would enable users to leverage their existing

systems.

Many other fields make use of information uncertainty for a variety of tasks, such

as planning and predicting. In many instances custom software or script-based math-

ematical software is used (e.g. MatLab). These users suffer the same drawbacks as

10.3 Possible Applications and Extensions 171

traditional spreadsheet users do and would similarly benefit from our framework. Fur-

ther work might consider exploring the encapsulation and abstraction approaches for

such custom and script-based systems.

The visual mapping of uncertainty information is done using the formula language.

Future work could explore user interfaces to aid the visual mapping process. The user

interface design for selecting user-objectives would provide a good starting point. How

can this be expanded to effectively aid the user to construct useful visualizations? Can

the system learn user preferences and thereby improve the options shown?

The prototype system relied on a text entry mechanism for the user to specify the

uncertainty and information for a particular variable. New interfaces might be de-

veloped to explore more intuitive ways of eliciting uncertainty information from the

user. For example, a graphical interface that enables users to draw a fuzzy membership

function may be appropriate.

Interaction with visualizations has previously been shown to be useful. For exam-

ple, brushing is commonly available even in graphs produced by traditional spread-

sheet systems. As comments from the survey indicated, interactivity is a feature that

users value, particularly when it comes to uncertainty. Further work should be carried

out to investigate how interaction can be incorporated into the current framework. For

example, is it appropriate to set the value of a cell elsewhere in the spreadsheet, as

per [65]?

Abstraction from the uncertainty information for visualization opens an opportu-

nity for research into visualization techniques using this approach. For example, how

might a pie chart or a tree map be extended to incorporate plural values?

172 Chapter 10. Conclusion and Future Work

Bibliography

[1] Ken Arnold, James Gosling, and David Holmes. The Java programming lan-

guage. Addison-Wesley, 4th ed. edition, 2006.

[2] Bilal M. Ayyub and George J. Klir. Uncertainty modeling and analysis in engi-

neering and the sciences. Chapman & Hall/CRC„ Boca Raton, FL :, 2006.

[3] Humberto Barreto and Frank Howland. Introductory Econometrics: Using

Monte Carlo Simulation with Microsoft Excel. Cambridge University Press,

2006.

[4] L. Bavoil, S. P. Callahan, P. J. Crossno, J. Freire, C. E. Scheidegger, C. T. Silva,

and H. T. Vo. Vistrails: enabling interactive multiple-view visualizations. In

Visualization, 2005. VIS 05. IEEE, pages 135–142, 2005.

[5] M. Berthold and L. Hall. Visualizing fuzzy points in parallel coordinates. IEEE

Transactions on Fuzzy Systems, 11:369–374, June 2003.

[6] M. R. Berthold and R. Holve. Visualizing high dimensional fuzzy rules. pages

64–68, 2000.

[7] Edward Blackwell. How to prepare a business plan. Kogan Page, 2004.

[8] Grady Booch, James Rumbaugh, and Ivar Jacobson. The unified modeling lan-

guage user guide. Addison-Wesley, 1999.

[9] Polly S. Brown and John D. Gould. An experimental study of people creating

spreadsheets. ACM Transactions on Office Information Systems, 5(3):258–272,

July 1987.

174 BIBLIOGRAPHY

[10] R. Brown. Animated visual vibrations as an uncertainty visualization technique.

In International Conference on Graphics and Interactive Techniques in Aus-

tralasia and South East Asia, pages 84–89. ACM, June 2004.

[11] R. Brown and B. Pham. Visualisation of fuzzy decision support information: A

case study. In IEEE International Conference on Fuzzy Systems, pages 601–606,

St Louis, 2003.

[12] M. M. Burnett, M. J. Baker, C. Bohus, P. Carlson, S. Yang, and P. Van Zee.

Scaling up visual programming languages. Computer, 28(3):45–54, 1995.

[13] Steven P. Callahan, Juliana Freire, Emanuele Santos, Carlos E. Scheidegger,

Claudio T. Silva, and Huy T. Vo. Vistrails: visualization meets data manage-

ment. In SIGMOD ’06: Proceedings of the 2006 ACM SIGMOD International

Conference on Management of Data, pages 745–747, New York, NY, USA,

2006. ACM Press.

[14] Alexander Campbell, Erik Berglund, and Alexander Streit. Graphics hardware

implementation of the parameter-less self-organising map. Intelligent Data En-

gineering and Automated Learning - IDEAL 2005, pages 343–350, 2005.

[15] S. K. Card and J. Mackinlay. The structure of the information visualization

design space. pages 92–99, 125, 1997.

[16] E. Chi. A taxonomy of visualization techniques using the data state reference

model. In IEEE Symposium on Information Visualization, pages 69–75. IEEE

Press, Oct 2000.

[17] E. H. Chi and S. K. Card. Sensemaking of evolving web sites using visualization

spreadsheets. pages 18–25, 142, 1999.

[18] Ed Huai-Hsin Chi and J. T. Riedl. An operator interaction framework for visu-

alization systems. pages 63–70, 1998.

[19] Helen Couclelis. The certainty of uncertainty: Gis and the limits of geographic

knowledge. Transactions in GIS, 7(2):165–175, 2003.

[20] Z. Cox, J.A. Dickerson, and D. Cook. Visualizing membership in multiple clus-

ters after fuzzy c-means clustering. In R. Erbacher, P. Chen, J. Roberts, C. Wit-

tenbrink, and M. Grohn, editors, Visual Data Exploration and Analysis VIII,

volume 4302, pages 60–68. SPIE Bellingham, Washington, 2001.

[21] I. Cruz. Tailorable information visualization. ACM Computing Surveys, 28(4),

December 1996.

BIBLIOGRAPHY 175

[22] David de Vaus. Analyzing social science data. SAGE, 2002.

[23] J.A. Dickerson, Z. Cox, E.S. Wurtele, and A.W. Fulmer. Creating metabolic

and regulatory network models using fuzzy cognitive maps. In M.H. Smith,

W.A. Gruver, and L.O. Hall, editors, IFSA World Congress and 20th NAFIPS

International Conference, 2001. Joint 9th, volume 4, pages 2171–2176, Dept.of

Electr. Eng, Iowa State Univ., Ames, IA, USA, 2001. Practical.

[24] S. Djurcilov, K. Kim, P. Lermusiaux, and A. Pang. Volume rendering data with

uncertainty information. In D. Ebert, J.M. Favre, and R. Peikert, editors, Data

Visualization 2001, pages 243–252, 355–356. Springer, 2001.

[25] S. Djurcilov, K. Kim, P. Lermusiaux, and A. Pang. Visualizing scalar volumetric

data with uncertainty. Elsevier Computers & Graphics, 26:239–248, 2002.

[26] Mark Dodge and Craig Stinson. Microsoft Office Excel 2007 : inside out. Mi-

crosoft Press, 2007.

[27] D. J. Duke, K. W. Brodlie, and D. A. Duce. Building an ontology of visualiza-

tion. pages 7–7, 2004.

[28] David H. Eberly. 3D game engine design : a practical approach to real-time

computer graphics. Morgan Kaufmann, 2001.

[29] EURACHEM and CITAC Measurement Uncertainty Working Group. Quanti-

fying Uncertainty in Analytical Measurement. Eurachem, 2nd edition edition,

2000.

[30] Marc II Fisher, Gregg Rothermel, Darren Brown, Mingming Cao, Curtis Cook,

and Margaret Burnett. Integrating automated test generation into the WYSI-

WYT spreadsheet testing methodology. ACM Trans. Softw. Eng. Methodol.,

15(2):150–194, 2006.

[31] P. Fisher. Visualizing uncertainty in soil maps by animation. Cartographica,

30(2+3):20–27, 1993.

[32] Y. Fujiwara, M. Shirashi, D. Nakagawa, and S. Okada. Visualization of the

rule-based program by a 3d flowchart. In 6th International Conference on Fuzzy

Theory and Technology (JCIS), volume 3, pages 250–254, NC, USA, Oct 1998.

[33] N. Gershon. Visualization of an imperfect world. Computer Graphics and

Applications, IEEE, 18(4):43–45, 1998.

176 BIBLIOGRAPHY

[34] N. D. Gershon. Visualization of fuzzy data using generalized animation. pages

268–273, 1992.

[35] John Geweke, William McCausland, and John Stevens. Using Simulation Meth-

ods for Bayesian Econometric Models, chapter 8. Marcel Dekker, Inc., 2003.

[36] M.F. Goodchild, D.R. Montello, P. Fohl, and J. Gottsegen. Fuzzy spatial queries

in digital spatial data libraries. In IEEE World Congress on Computational

Intelligence Fuzzy Systems Proceedings, volume 1, pages 205 – 210, 4-9 May

1998.

[37] G. Grigoryan and P. Rheingans. Probabilistic surfaces: point based primitives to

show surface uncertainty. In Visualization, 2002. VIS 2002. IEEE, pages 147–

153, 2002.

[38] G. Grigoryan and P. Rheingans. Point-based probabilistic surfaces to show sur-

face uncertainty. Visualization and Computer Graphics, IEEE Transactions on,

10(5):564–573, 2004.

[39] L. O. Hall and M. R. Berthold. Fuzzy parallel coordinates. pages 74–78, 2000.

[40] Charles D. Hansen and Chris Johnson. Visualization Handbook. Academic

Press, December 2004.

[41] S. Henderson. Vised: Visualization techniques. Retrieved 25 June 2004 from

�� , 1996.

[42] T. Hengl. Visualisation of uncertainty using the hsi colour model: computations

with colours. In 7th International Conference on GeoComputation, page 8,

2003.

[43] Hugues Hoppe. Progressive meshes. In SIGGRAPH ’96: Proceedings of

the 23rd annual conference on Computer graphics and interactive techniques,

pages 99–108, New York, NY, USA, 1996. ACM Press.

[44] D. Howard and A. M. MacEachren. Interface design for geographic visualiza-

tion: Tools for representing reliability. Cartography and Geographic Informa-

tion Systems, 23(2):59–77, 1996.

[45] Ed Huai hsin Chi, Joseph Konstan, Phillip Barry, and John Riedl. A spreadsheet

approach to information visualization. In UIST ’97: Proceedings of the 10th

Annual ACM Symposium on User Interface Software and Technology, pages

79–80, New York, NY, USA, 1997. ACM Press.

BIBLIOGRAPHY 177

[46] Ed Huai hsin Chi, John Riedl, Phillip Barry, and Joseph Konstan. Principles for

information visualization spreadsheets. IEEE Computer Graphics and Applica-

tions, 18(4):30–38, 1998.

[47] G.J. Hunter and M.F. Goodchild. Managing uncertainty in spatial databases:

Putting theory into practice. Journal of the Urban and Regional Information

Systems Association, 5(2):55–62, 1993.

[48] Tomas Isakowitz, Shimon Schocken, and Jr. Henry C. Lucas. Toward a logi-

cal/physical theory of spreadsheet modeling. ACM Trans. Inf. Syst., 13(1):1–37,

1995.

[49] T. J. Jankun-Kelly, K.-L. Ma, and M. Gertz. A model for the visualization

exploration process. In IEEE Visualization, pages 323–330, 2002.

[50] J. L. Soh Y. C. Jiang, B. Wang. Robust fault diagnosis for a class of bilinear sys-

tems with uncertainty. In IEEE Conference on Decision and Control, volume 5,

pages 4499–4504. IEEE, 1999.

[51] C.R. Johnson and A.R. Sanderson. A next step: Visualizing errors and un-

certainty. IEEE Computer Graphics and Applications, 23(5):6–10, Septem-

ber/October 2003.

[52] D. Kao, J. Dungan, and A. Pang. Visualizing 2d probability distributions from

eos satellite image-derived data sets: A case study. In IEEE Visualization, pages

457–460, 2001.

[53] David Kao, Alison Luo, Jennifer L. Dungan, and Alex Pang. Visualizing spa-

tially varying distribution data. iv, 00:219, 2002.

[54] David L. Kao, Marc G. Kramer, Alison L. Love, Jennifer L. Dungan, and Alex T.

Pang. Visualizing distributions from multi-return lidar data to understand forest

structure. Cartographic Journal, The, 42:35–47(13), June 2005.

[55] A. Keller. Fuzzy clustering with outliers. pages 143–147, 2000.

[56] P. Keller and M. Keller. Visual Cues. IEEE Press, 1992.

[57] B. Kitchenham, L.M. Pickard, S. Linkman, and P.W. Jones. Modeling soft-

ware bidding risks. IEEE Transactions on Software Engineering, 29(6):542–

554, June 2003.

[58] George Klir and Richard Smith. On measuring uncertainty and uncertainty-

based information: Recent developments. Annals of Mathematics and Artificial

Intelligence, 32(1):5–33, 2001.

178 BIBLIOGRAPHY

[59] George J. Klir. Generalized information theory: aims, results, and open prob-

lems. Reliability Engineering & System Safety, 85(1-3):21–38, 2004.

[60] George J. Klir. Uncertainty and Information: Foundations of Generalized In-

formation Theory. Wiley-Interscience, 2005.

[61] G.J. Klir. An update on generalized information theory. In Third International

Symposium on Imprecise Probabilities and Their Applications, number 18 in

Proceedings in Informatics, Lugano, Switzerland, 2003. Carleton Scientific.

[62] G.J. Klir and D. Harmanec. Generalized information theory: recent develop-

ments. Kybernetes, 25(7/8):50–67, 1996.

[63] G.J. Klir and M.J. Wierman. Uncertainty-Based Information: Elements of Gen-

eralized Information Theory. Springer, 1999.

[64] R. Kosara, S. Miksch, and H. Hauser. Focus+context taken literally. IEEE

COMPUTER GRAPHICS AND APPLICATIONS, 22:22–29, 2002.

[65] Marc Levoy. Spreadsheets for images. In SIGGRAPH ’94: Proceedings of

the 21st Annual Conference on Computer Graphics and Interactive Techniques,

pages 139–146, New York, NY, USA, 1994. ACM Press.

[66] S. K. Lodha, C. M. Wilson, and R. E. Sheehan. Listen: Sounding uncertainty

visualization. In Visualization, pages 189–95, San Francisco, California, 1996.

IEEE.

[67] S.K. Lodha, A. Pang, R.E. Sheehan, and C.M. Wittenbrink. Uflow: Visualizing

uncertainty in fluid flow. IEEE Visualization’96, pages 249–254, 1996.

[68] Suresh K. Lodha, Bob Sheehan, Alex T. Pang, and Craig M. Wittenbrink. Vi-

sualizing geometric uncertainty of surface interpolants. In Wayne A. Davis

and Richard Bartels, editors, Graphics Interface ’96, pages 238–245. Canadian

Human-Computer Communications Society, 1996.

[69] Adriano Lopes and Ken Brodlie. Mathematical Visualization, chapter Accuracy

in 3D Particle Tracing, pages 329–341. Springer-Verlag, Heidelberg, 1998.

[70] A. Lowe, R. Jones, and M. Harrison. The graphical presentation of decision

support information in an intelligent anaesthesia monitor. Artificial Intelligence

in Medicine, 22:173–191, 2001.

[71] Jan Łukasiewicz. Elements of Mathematical Logic. Translated from Polish by

Olgierd Wojtasiewicz. Macmillan, New York, 1964.

BIBLIOGRAPHY 179

[72] Alison Luo, David Kao, and Alex Pang. Visualizing spatial distribution data

sets. In VISSYM ’03: Proceedings of the symposium on Data visualisation

2003, pages 29–38, Aire-la-Ville, Switzerland, Switzerland, 2003. Eurographics

Association.

[73] K.-L. Ma. Visualization - a quickly emerging field. ACM Computer Graphics,

February:4–7, 2004.

[74] Alan M. MacEachren, Anthony Robinson, Susan Hopper, Steven Gardner,

Robert Murray, Mark Gahegan, and Elisabeth Hetzle. Visualizing geospatial

information uncertainty: What we know and what we need to know. Cartogra-

phy and Geographic Information Science, 32(3):139–160, July 2005.

[75] The MathWorks. Fuzzy Logic Toolbox For Use with MATLAB: User’s Guide

ver. 2. The MathWorks, Inc., 1999.

[76] Don L. McLeish. Monte Carlo Simulation and Finance. John Wiley & Sons,

Inc., 2005.

[77] J. M. Mendel. Uncertain Rule-Based Fuzzy Logic Systems. Prentice Hall PTR,

2001.

[78] Microsoft. Description of the stdev function in excel 2003. Help and Support

Article 826349, January 2007.

[79] Hung T. Nguyen and E. Walker. A first course in fuzzy logic. Chapman & Hall,

Boca Raton, FL, 2000.

[80] A. Nurnberger, A. Klose, and R. Kruse. Discussing cluster shapes of fuzzy

classifiers. In 18th International Conference of the North American Fuzzy In-

formation Processing Society, pages 546–550, July 1999.

[81] A. Nurnberger, A. Klose, and R. Kruse. Effects of antecedent pruning in

fuzzy classification systems. In Proceedings of 4th International Conference

on Knowledge-Based Intelligent Engineering Systems & Allied Technologies,

pages 154–157, 2000.

[82] J. Ohene-Djan, A. Sammon, and R. Shipsey. Colour spectrums of opinion: An

information visualisation interface for representing degrees of emotion in real

time. In Information Visualization, pages 80–88, July 2006.

[83] C. Olston and JD Mackinlay. Visualizing data with bounded uncertainty. Infor-

mation Visualization, 2002. INFOVIS 2002. IEEE Symposium on, pages 37–40,

2002.

180 BIBLIOGRAPHY

[84] A. Pang and N. Alper. Bump mapped vector fields. 1995.

[85] Alex Pang and Adam Freeman. Methods for comparing 3d surface attributes.

volume 2656, pages 58–64. SPIE, 1996.

[86] Alex T. Pang, Craig M. Wittenbrink, and Suresh K. Lodha. Approaches to

uncertainty visualization. The Visual Computer, 13:370–390, 1997.

[87] R. R. Panko. Two corpuses of spreadsheet errors. pages 8 pp. vol.1–, 2000.

[88] R. R. Panko and Jr. Halverson, R. P. Individual and group spreadsheet design:

patterns of errors. volume 4, pages 4–10, 1994.

[89] Zdzisaw Pawlak. Rough sets : theoretical aspects of reasoning about data.

Kluwer Academic Publishers, Dordrecht & Boston, 1991.

[90] Judea Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann,

1988.

[91] B. Pham and R. Brown. Analysis of visualisation requirements for fuzzy sys-

tems. In Proceedings of the 1st International Conference on Computer Graphics

and Interactive Techniques in Austalasia and South East Asia, pages 181–187,

2003.

[92] B. Pham, A. Streit, and R Brown. Interactive Visualisation: A State-of-the-Art

Survey, chapter Visualisation of Information Uncertainty: Progress and Chal-

lenges. Springer, UK, 2007.

[93] Binh Pham and Ross Brown. Multi-agent approach for visualisation of fuzzy

systems. In International Conference on Computational Science. Lecture Notes

in Computer Science, June 2-4 2003.

[94] Binh Pham and Ross Brown. Visualisation of fuzzy systems: requirements,

techniques and framework. Future Generation Computer Systems, 21(7):1199–

1212, 2005.

[95] Kurt W. Piersol. Object-oriented spreadsheets: the analytic spreadsheet pack-

age. In OOPLSA ’86: Conference Proceedings on Object-oriented Program-

ming Systems, Languages and Applications, pages 385–390, New York, NY,

USA, 1986. ACM Press.

[96] Lech Polkowski. Rough Sets: Mathematical Foundations. Physica-Verlag,

2002.

BIBLIOGRAPHY 181

[97] Bruno R. Preiss. Data structures and algorithms : with object-oriented design

patterns in C++. Wiley, 1999.

[98] Ramana Rao and Stuart K. Card. The table lens: merging graphical and sym-

bolic representations in an interactive focus + context visualization for tabular

information. In CHI ’94: Proceedings of the SIGCHI Conference on Human

Factors in Computing Systems, pages 318–322, New York, NY, USA, 1994.

ACM Press.

[99] M. Reed and D. Heller. Olive: Online library of information visualization envi-

ronments. Retrieved 15 May 2004 from �� ,

1997.

[100] T. Reinhardt and N. Pillay. Analysis of spreadsheet errors made by computer

literacy students. pages 852–853, 2004.

[101] L. Reznik and B. Pham. Fuzzy models in evaluation of information uncertainty

in engineering and technology applications. volume 2, pages 972–975 vol.3,

2001.

[102] George G. Robertson, Jock D. Mackinlay, and Stuart K. Card. Cone trees:

animated 3d visualizations of hierarchical information. In Proceedings of the

SIGCHI conference on Human factors in computing systems: Reaching through

technology, New Orleans, Louisiana, United States, 1991. ACM Press.

[103] Daniel M. Russell, Mark J. Stefik, Peter Pirolli, and Stuart K. Card. The cost

structure of sensemaking. In CHI ’93: Proceedings of the SIGCHI conference

on Human factors in computing systems, pages 269–276, New York, NY, USA,

1993. ACM Press.

[104] A.R. Sanderson, C.R. Johnson, and R.M. Kirby. Display of vector fields using

a reaction-diffusion model. In Visualization, 2004. IEEE, pages 115–122, 2004.

[105] Will Schroeder, Ken Martin, and Bill Lorensen. The Visualization Toolkit, Third

Edition. Kitware Inc., 2004.

[106] Hikmet Senay and Eve Ignatius. Rules and Principles of Scientific Data Visu-

alization. Washington, D.C.: Institute for Information Science and Technology,

Dept. of Electrical Engineering and Computer Science, School of Engineering

and Applied Science, George Washington University, 1990.

[107] G. Shaefer. Perspectives on the theory and practice of belief functions. Interna-

tional Journal of Approximate Reasoning, 3:1–40, 1990.

182 BIBLIOGRAPHY

[108] B. Shneiderman. The eyes have it: a task by data type taxonomy for information

visualizations. In IEEE Symposium on Visual Languages, pages 336–343. IEEE

Press, Sep 1996.

[109] T.A. Slocum, D.C. Cliburn, J.J. Feddema, and J.R. Miller. Evaluating the usabil-

ity of a tool for visualizing the uncertainty of the future global water balance.

Cartography and Geographic Information Science, 30(4):299–318, 2003.

[110] Alexander Streit, Binh Pham, and Ross Brown. Visualization support for man-

aging large business process specifications. In Proceedings 3rd International

Conference, Business Process Management, BPM2005, pages 206–219, Nancy,

France, September 2005. Springer Verlag.

[111] Randall J. Swift and Malempati Madhusudana Rao. Probability Theory with

Applications. Springer, revised edition edition, 2006.

[112] Barry N. Taylor and Chris E. Kuyatt. Guidelines for evaluating and express-

ing the uncetainty of NIST measurement results (1994 edition). Technical

Note 1297, National Institute of Standards and Technology, Gaithersburg, MD,

September 1994.

[113] A Thomas. Visualisation and Modelling, chapter Contouring Algorithms for

Visualisation and Shape Modelling Systems, pages 99–175. Academic Press,

San Diego, USA, 1997.

[114] J. Thomson, E. Hetzler, A. MacEachren, M. Gahegan, and M. Pavel. A typology

for visualizing uncertainty. Proc. SPIE, 5669:146–157, 2005.

[115] M. Tory and T. Möller. Rethinking visualization: A high-level taxonomy. In

IEEE Symposium on Information Visualization, pages 151–158. IEEE Press,

Oct 2004.

[116] A. Treisman and G. Gelade. A feature-integration theory of attention. Cognitive

Psychology, 12:97–136, 1980.

[117] E. Tufte. The Visual Display of Quantitative Information. Graphics Press,

Cheshire, USA, 1983.

[118] Edward R. Tufte. Envisioning Information. Graphics Press, May 1990.

[119] Wil van der Aalst. Business process management demystified: A tutorial on

models, systems and standards for workflow management. In Lectures on Con-

currency and Petri Nets, volume 3098, pages 1–65. Springer Verlag, Berlin,

2004.

BIBLIOGRAPHY 183

[120] Wil van der Aalst and Arthur ter Hofstede. Yawl: Yet another workflow lan-

guage. In Information Systems, volume 30, June 2005.

[121] Wil M.P. van der Aalst, Arthur H.M. ter Hofstede, and Mathias Weske. Business

process management: A survey. pages 1019–1019, 2003.

[122] Jarke J. van Wijk and Robert van Liere. Hyperslice: visualization of scalar

functions of many variables. In VIS ’93: Proceedings of the 4th Conference on

Visualization ’93, pages 119–125, 1993.

[123] A. Varshney and A. Kaufman. Finesse: a financial information spreadsheet. In

INFOVIS ’96: Proceedings of the 1996 IEEE Symposium on Information Visual-

ization (INFOVIS ’96), page 70, Washington, DC, USA, 1996. IEEE Computer

Society.

[124] B. Wandell. Foundations of Human Vision. Sinauer, Sunderland, USA, 1995.

[125] C. Wittenbrink, A. Pang, and S. Lodha. Glyphs for visualizing uncertainty

in vector fields. IEEE Transactions on Visualization and Computer Graphics,

2(3):266–279, 1996.

[126] L. Zhou and A. Pang. Metrics and visualization tools for surface mesh compari-

son. Technical report, Computer Science Department, University of California,

Santa Cruz, 2000.

184 BIBLIOGRAPHY

APPENDIX A

First Survey

This is the first survey, which was sent out to particpants from a variety of fields.

186 Chapter A. First Survey

Projecting Sales A sales projection scenario is given: Sales (estimated) is the projected sales Expenses is composed of two parts: fixed expenses and sales expenses (a percentage of sales). The net income is calculated from sales – expenses.

The images below show the scenario using a spreadsheet. There are four quarters. The field labeled “rate” is the sales expenses rate – a percentage of sales.

Now we want a sensitivity analysis on net income based on the sales expenses (rate field). We wish to investigate the difference 5 percentage points can make to net income. Only the rate field was changed. The other cells changed automatically, since they are calculated from formulae.

Illustration 1: Typical spreadsheet for the scenario

Illustration 2: Typical line graph

187

To further the example, we wish to investigate the impact that variance in sales might have. We change each sale into an interval:

Illustration 3: Rate is an interval: 15 + or - 5%

Illustration 4: Net income is an interval

Illustration 5: Sales and Sales Expenses can vary

188 Chapter A. First Survey

The graph shows the increased variance. In the previous examples we have used that variance format (base +- variance), but we can also define intervals using lower and upper bounds. For example:

The graph for this is identical to Illustration 6.

QuestionsThe above example shows a simple sensitivity analysis scenario. I would be very appreciative if you could answer the following subjective questions.

1. Did the interval approach make sense to you? Was it intuitive enough to look at the screenshots and follow the logic?

2. Would you consider this a more useful tool than a standard spreadsheet? For the purpose of sensitivity analysis.

3. Would you like to use a tool like this? Why / Why not 4. What would you change?

(optional) 5. Do you have any other comments about what you have seen?

Illustration 6: Variance is greater when both sales and sales expenses can vary

Illustration 7: Sales are in "lower and upper bounds" format

APPENDIX B

Second Survey

This is the second survey, in which financial experts were asked to complete tasks

using our software. The participants were monitored and were offered assistance when

requested.

190 Chapter B. Second Survey

IvySheet Survey

Entrance Questionnaire

NAME

Please rate your experience in the following areas:

Spreadsheet Software (e.g. Microsoft Excel)

Visualization Systems (e.g. Spotfire, AVS)

Uncertainty Modeling (e.g. statistical methods)

Financial Modeling (e.g. forecasting, sensitivity analysis)

None Beginner Intermediate Advanced




191

IvySheet Survey

Task Description

Thank you for taking part in this survey. The purpose of this survey is to evaluate the approaches used to create the IvySheet software package. IvySheet is a spreadsheet system that lets you enter uncertainty information about variables. For example, a variable can be modeled as an interval or a normal probability distribution.

Today you will be asked to create a spreadsheet and graphs for a sales sensitivity scenario. All of the information you need is provided for you below. Some statistics about your usage will be recorded and at the end you will be asked questions about your experience. You may request assistance at any time.

The Task A sales projection scenario is given:

Sales (estimated) are the projected gross sales. Fixed expenses are overheads incurred. Sales expenses are the variable costs incurred. The rate is currently 15% of gross sales, but this will change. Net income is calculated from sales – fixed expenses – variable expenses. There are four quarters.

The data follow:

Rate 15%

Quarter Sales Fixed expensesQ1 1200 500Q2 1000 500Q3 1500 500Q4 1800 500

Step 1 Create a spreadsheet for the scenario.

Step 2 Create a line graph for the net income. To do this:

1. Navigate to an empty cell in the spreadsheet. 2. Choose “Create Line Graph” from the “Visualization” menu. You can

optionally enter text in the title and axis label fields. 3. In the data field, enter the range of cells. For example, “D4:D8”.


Step 3 Navigate to the rate field and type “[5..10]”. This has changed the rate into an interval of values between 5 and 10. Intervals can be entered into any cell using the format [number..number].

Please answer the following questions, making whatever adjustments to the spreadsheet you deem necessary.

1. What are the minimum and maximum amounts for net income in Q4 if the rate is between 10 and 20%?

2. What is the minimum net income for Q3 if the rate is between 10 and 20% AND the sales for that quarter are between 1000 and 2000?

3. What are the minimum and maximum amounts of net income for Q3, if the rate is between 15 and 18%, the sales are between 1500 and 1800, and the fixed expenses are between 500 and 800?

193

IvySheet Survey

Questions

NAME

Please rate how much you agree with the following statements. For each statement, tick one box only. There is space for you to write comments at the end.

Q1. Overall, I found the system to be intuitive.

Q2. Placing an interval in a cell made sense to me

Q3. I would need more training to complete the tasks

Q4. I found it easy to follow the effect that an interval had on the rest of the spreadsheet

Strongly Disagree

Disagree Neither Agree nor Disagree

Agree Strongly Agree

Strongly Disagree



Strongly Disagree



Strongly Disagree




Q5. The graph helped me to understand the effects better

Q6. I would definitely use the graph if there were many numbers

Q7. The spreadsheet became too cluttered or hard to read

Q8. This tool makes it easier to make mistakes

Q9. I would use a tool like this if it were available to me

Q10. I would use intervals in Microsoft Excel if they were supported

Strongly Disagree



Strongly Disagree



Strongly Disagree



Strongly Disagree



Strongly Disagree



Strongly Disagree



195

Q11. My current tools are adequate for modeling and visualizing uncertainty

Q12. I would use normal probability distributions more often in my work if I had a tool like this.

Q13. I would model uncertainty more often if I had these features available

CommentsPlease feel free to write any comments below.

Strongly Disagree



Strongly Disagree



Strongly Disagree




IvySheet Survey

Observations Sheet [not to be shown to participants]

NAME

Step 1 Number of errors made:

Number of corrections made:

Number of calls for help:

Time taken:

Step 2 Number of attempts made:

Number of errors made:



Time taken:

Step 3 Answer 1: (Should be 940 and 1120)

Answer 2: (Should be 300)

Answer 3: (Should be 430 and 1030)

Did they use intervals to get all three answers?

Number of edits made:

Number of errors made:



Time taken:

197

IvySheet SurveyResults

ParticipantsEntrance RS KF JH Average StddevSpreadsheets 4 4 3 3.67 0.58Visualization 2 2 1 1.67 0.58Uncertainty 3 3 2 2.67 0.58Financials 4 4 3 3.67 0.58

Question1 5 5 4 4.67 0.582 5 4 4 4.33 0.583 1 1 1 1.00 0.004 4 4 4 4.00 0.005 4 3 4 3.67 0.586 5 4 5 4.67 0.587 1 2 2 1.67 0.588 1 1 1 1.00 0.009 5 4 5 4.67 0.58

10 5 5 5 5.00 0.0011 1 3 2 2.00 1.0012 5 5 4 4.67 0.5813 5 4 4 4.33 0.58

PerformanceStep 1errors 0 1 0 0.33 0.58corrections 4 2 1 2.33 1.53help 0 0 0 0.00 0.00time (m) 6 7 11 8.00 2.65Step 2attempts 1 1 1 1.00 0.00errors 0 0 0 0.00 0.00corrections 0 0 0 0.00 0.00help 0 0 1 0.33 0.58time (m) 1 1 1 1.00 0.00Step 3Ans1 right y y yAns2 right y y yAns3 right y y yintervals y y yedits 4 3 3 3.33 0.58errors 0 0 0 0.00 0.00corrections 1 0 0 0.33 0.58help 0 0 0 0.00 0.00time (m) 3 4 5 4.00 1.00

Encapsulation and Abstraction for Modeling and...

Documents

Transcript of Encapsulation and Abstraction for Modeling and...