Encapsulation and Abstraction for Modeling and...
Transcript of Encapsulation and Abstraction for Modeling and...
Encapsulation and Abstraction forModeling and VisualizingInformation Uncertainty
Alexander StreitBachelor of Information Technology (Honours)
Queensland University of Technology
A thesis submitted in partial fulfilment of the requirements for the degree ofDoctor of Philosophy
March 2008
Principal Supervisor: Prof. Binh PhamAssociate Supervisor: Dr. Ross Brown
Faculty of Information TechnologyQueensland University of TechnologyBrisbane, Queensland, AUSTRALIA
© Copyright by Alexander Streit 2008All Rights Reserved
ii
Dedication
For Jasper, Fral, and Jilli.
iii
iv
Keywords
Information Uncertainty Visualization, Information Uncertainty Modeling, Spread-
sheets, Visualization Spreadsheets, Uncertainty Visualization Spreadsheets, Visualiza-
tion Tools, Modeling Tools, Uncertainty Modeling, Uncertainty Visualization, Proba-
bility, Fuzzy Visualization, Visualization Frameworks, Visualization
v
vi
Abstract
Information uncertainty is inherent in many real-world problems and adds a layer of
complexity to modeling and visualization tasks. This often causes users to ignore
uncertainty, especially when it comes to visualization, thereby discarding valuable
knowledge. A coherent framework for the modeling and visualization of information
uncertainty is needed to address this issue
In this work, we have identified four major barriers to the uptake of uncertainty
modeling and visualization. Firstly, there are numerous uncertainty modeling tech-
niques and users are required to anticipate their uncertainty needs before building their
data model. Secondly, parameters of uncertainty tend to be treated at the same level
as variables making it easy to introduce avoidable errors. This causes the uncertainty
technique to dictate the structure of the data model. Thirdly, propagation of uncertainty
information must be manually managed. This requires user expertise, is error prone,
and can be tedious. Finally, uncertainty visualization techniques tend to be developed
for particular uncertainty types, making them largely incompatible with other forms
of uncertainty information. This narrows the choice of visualization techniques and
results in a tendency for ad hoc uncertainty visualization.
The aim of this thesis is to present an integrated information uncertainty modeling
and visualization environment that has the following main features: information and
its uncertainty are encapsulated into atomic variables, the propagation of uncertainty is
automated, and visual mappings are abstracted from the uncertainty information data
type.
Spreadsheets have previously been shown to be well suited as an approach to visu-
alization. In this thesis, we devise a new paradigm extending the traditional spreadsheet
to intrinsically support information uncertainty.
vii
Our approach is to design a framework that integrates uncertainty modeling tech-
niques into a hierarchical order based on levels of detail. The uncertainty information
is encapsulated and treated as a unit allowing users to think of their data model in terms
of the variables instead of the uncertainty details. The system is intrinsically aware of
the encapsulated uncertainty and is therefore able to automatically select appropriate
uncertainty propagation methods.
A user-objectives based approach to uncertainty visualization is developed to guide
the visual mapping of abstracted uncertainty information. Two main abstractions of
uncertainty information are explored for the purpose of visual mapping: the Unified
Uncertainty Model and the Dual Uncertainty Model. The Unified Uncertainty Model
provides a single view of uncertainty for visual mapping, whereas the Dual Uncertainty
Model distinguishes between possibilistic and probabilistic views. Such abstractions
provide a buffer between the visual mappings and the uncertainty type of the underly-
ing data, enabling the user to change the uncertainty detail without causing the visual-
ization to fail.
Two main case studies are presented. The first case study covers exploratory
and forecasting tasks in a business planning context. The second case study inves-
tigates sensitivity analysis for financial decision support. Two minor case studies are
also included: one to investigate the relevancy visualization objective applied to busi-
ness process specifications, and the second to explore the extensibility of the system
through General Purpose Graphics Processor Unit (GPGPU) use. A quantitative anal-
ysis compares our approach to traditional analytical and numerical spreadsheet-based
approaches. Two surveys were conducted to gain feedback on the from potential users.
The significance of this work is that we reduce barriers to uncertainty modeling
and visualization in three ways. Users do not need a mathematical understanding of
the uncertainty modeling technique to use it; uncertainty information is easily added,
changed, or removed at any stage of the process; and uncertainty visualizations can be
built independently of the uncertainty modeling technique.
viii
Publications
1. Pham, B. and Streit, A. and Brown, R. “Visualisation of Information Uncertainty: Progress and
Challenges,” in Interactive Visualisation: A State-of-the-Art Survey, Elena Zudilova-Seinstra,
Tony Adriaansen and Robert van Liere (eds.), 2007, Springer, UK. In Print.
2. Streit, A. and Pham, B. and Brown, R. “A Spreadsheet Approach to Facilitate Visualization of
Uncertainty in Information,” IEEE Transactions on Visualization and Computer Graphics, vol.
14, no. 1, pp. 61-72, Jan/Feb, 2008
3. Streit, A. and Pham, B. and Brown, R. Visualisation Support for Managing Large Business Pro-
cess Specifications. International Conference on Business Process Management (BPM). Nancy,
France, September 6-8, 2005. Lecture Notes in Computer Science, Springer. Acceptance rate:
13%
4. Campbell, A. and Berglund, E. and Streit, A. Graphics Hardware Implementation of the Parameter-
Less Self-Organising Map. International Conference on Intelligent Data Engineering and Au-
tomated Learning (IDEAL’05). Brisbane, July 6-8, 2005. Pages 343-350. Lecture Notes in
Computer Science, Springer.
ix
x
Acknowledgments
This thesis would not have been possible without my principal supervisor, Prof. Binh
Pham, and my associate supervisor, Dr. Ross Brown. Both collaborated to teach me
their process for completing research projects. Invaluable knowledge for which I am
very grateful.
I wish especially to thank Fral, who supported me even when it didn’t seem ratio-
nal to do so. My mother, Jilli, who should really be receiving this degree herself. I
also wish to thank my Honors supervisor, Ruth Christie, who inspired me to pursue
postgraduate studies in the first instance.
I wish to thank my colleague, Dr. Robert Smith, who provided me with extensive
insight and feedback. Alexander Campbell for his many comments and suggestions.
Finally, I wish to thank my business associate, Dr. Andy Boud, for acting as an unof-
ficial mentor.
xi
xii
Abbreviations
ASP Analytical Spreadsheet Package
DUM Dual Uncertainty Model
EBNF Extended Backus-Naur Form
GIS Geographic Information Systems
GPGPU General Purpose Graphics Processing Unit
LIC Line Integral Convolution
NIST National Institute of Standards and Technology
PDF Probability Density Function
QUM Quad Uncertainty Model
SI The Spreadsheet for Images
SIV Spreadsheet for Information Visualization
UML Unified Modeling Language
UUM Unified Uncertainty Model
VTK The Visualization Toolkit
xiii
xiv
Contents
Abstract vii
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Original Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . 5
2 Background 7
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Information Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Sources of Information Uncertainty . . . . . . . . . . . . . . 9
2.2.2 Understanding Information Uncertainty . . . . . . . . . . . . 10
2.2.3 Approaches to Modeling Information Uncertainty . . . . . . . 11
2.3 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.1 The Sensemaking Process . . . . . . . . . . . . . . . . . . . 17
2.3.2 Visualization Techniques . . . . . . . . . . . . . . . . . . . . 20
2.4 Information Uncertainty Visualization Approaches . . . . . . . . . . 24
2.4.1 Low-level Features . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.2 Higher-level Constructions . . . . . . . . . . . . . . . . . . . 28
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
xv
3 Framework for Integrated Uncertainty Modeling and Visualization 33
3.1 A New Approach to Information Uncertainty . . . . . . . . . . . . . 33
3.2 Analysis of Issues and Requirements . . . . . . . . . . . . . . . . . . 35
3.2.1 Ad hoc Visualization Techniques . . . . . . . . . . . . . . . . 35
3.2.2 Incoherence of Uncertainty Models . . . . . . . . . . . . . . 38
3.2.3 Artificial Separation of Information and Uncertainty . . . . . 39
3.3 Components of the Framework . . . . . . . . . . . . . . . . . . . . . 40
3.3.1 Spreadsheet Paradigm . . . . . . . . . . . . . . . . . . . . . 41
3.3.2 Uncertainty Encapsulation . . . . . . . . . . . . . . . . . . . 42
3.3.3 Uncertainty Abstraction . . . . . . . . . . . . . . . . . . . . 42
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4 Spreadsheet Paradigm for Information Uncertainty 45
4.1 Motivation and Objectives . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Related Work on Spreadsheets . . . . . . . . . . . . . . . . . . . . . 46
4.3 Architecture and Features . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.1 Uncertainty Encapsulation . . . . . . . . . . . . . . . . . . . 48
4.3.2 Uncertainty Abstraction . . . . . . . . . . . . . . . . . . . . 50
4.4 New Process and Workflow . . . . . . . . . . . . . . . . . . . . . . . 52
4.5 Capabilities and Advantages . . . . . . . . . . . . . . . . . . . . . . 53
4.6 Case Study: Financial Decision Support . . . . . . . . . . . . . . . . 55
5 Uncertainty Encapsulation and Automated Propagation 61
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2 Unified Information Uncertainty Framework . . . . . . . . . . . . . . 62
5.2.1 Conceptualizing Information Uncertainty and its Usage . . . . 62
5.2.2 Categorization of Uncertainty Models . . . . . . . . . . . . . 66
5.2.3 Data Structures for Information Uncertainty . . . . . . . . . . 69
5.3 Automated Propagation of Information Uncertainty . . . . . . . . . . 73
5.3.1 Uncertainty Propagation Model . . . . . . . . . . . . . . . . 73
5.3.2 Hierarchical Heterogeneous Propagation . . . . . . . . . . . 74
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6 Uncertainty Abstraction for Visualization 81
6.1 Motivation and Objectives . . . . . . . . . . . . . . . . . . . . . . . 81
6.2 User-objectives for Information Uncertainty Visualization . . . . . . . 82
6.2.1 Analysis of User-Objectives . . . . . . . . . . . . . . . . . . 83
6.2.2 A Computer Assisted User-Objectives Selection Method . . . 86
6.3 Uncertainty Abstraction Models . . . . . . . . . . . . . . . . . . . . 88
xvi
6.3.1 The Unified Uncertainty Model . . . . . . . . . . . . . . . . 88
6.3.2 The Dual Uncertainty Model . . . . . . . . . . . . . . . . . . 89
6.3.3 Design and Use . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.3.4 Alternative Models . . . . . . . . . . . . . . . . . . . . . . . 94
6.4 Case Study: User-Objectives in Financial Decision Support . . . . . . 95
6.5 Case Study: Relevancy Objective in Business Process Management . 104
7 Integration of Core Features 113
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.2 Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . 114
7.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.3.1 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.3.2 Core Components . . . . . . . . . . . . . . . . . . . . . . . 117
7.3.3 Plugin Components . . . . . . . . . . . . . . . . . . . . . . . 124
7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8 Advanced Features and Extensibility 129
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
8.2 Advanced Features . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
8.2.1 Hierarchical Spreadsheets . . . . . . . . . . . . . . . . . . . 130
8.2.2 Floating Observers and Embedded Visualizations . . . . . . . 133
8.2.3 Customization . . . . . . . . . . . . . . . . . . . . . . . . . 135
8.3 Extensibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
8.4 Case Study: GPGPUSheet . . . . . . . . . . . . . . . . . . . . . . . 142
9 Evaluation 145
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
9.2 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 146
9.2.1 Construction Experiments . . . . . . . . . . . . . . . . . . . 148
9.2.2 Retrospection Experiments . . . . . . . . . . . . . . . . . . . 152
9.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
9.3 Sensitivity Analysis Surveys . . . . . . . . . . . . . . . . . . . . . . 157
9.3.1 First Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.3.2 Second Survey . . . . . . . . . . . . . . . . . . . . . . . . . 159
9.4 Case Study: Business Planning . . . . . . . . . . . . . . . . . . . . . 162
10 Conclusion and Future Work 169
10.1 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
10.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
xvii
10.3 Possible Applications and Extensions . . . . . . . . . . . . . . . . . 170
Bibliography 173
A First Survey 185
B Second Survey 189
xviii
List of Figures
2.1 Fuzzy Set for Hot . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Fuzzification for Temperature . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Results of Fuzzy Operations are Shown by the Grey Shaded Regions . 15
2.4 Defuzzification Using an α-cut . . . . . . . . . . . . . . . . . . . . . 16
2.5 Example Rough Set for Containment of a Region . . . . . . . . . . . 16
2.6 Four agents work together in the Visualization Support System . . . . 18
2.7 The Visualization Task Network (VTN) Learns Task-oriented Visual-
ization Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.8 An Ontology of Visualization . . . . . . . . . . . . . . . . . . . . . . 19
2.9 Visualization techniques categorized by the type of data to be visual-
ized [41] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.10 Selected examples of visualization techniques . . . . . . . . . . . . . 21
2.11 The Model-based Visualization Taxonomy . . . . . . . . . . . . . . . 23
2.12 Relationship between Uncertainty Visualization, Information Uncer-
tainty Visualization, Error Visualization, and Fuzzy Visualization . . . 24
2.13 Using opacity to show the structure of uncertainty. Color scheme (left),
Normal rendering (centre), Uncertainty structure (right) . . . . . . . . 26
2.14 Some visual mappings for showing difference. From left to right: over-
lay, rainbow mapping, white-black-white pseudo-coloring, glyph (hi-
pass filter), glyph (low-pass filter) . . . . . . . . . . . . . . . . . . . 27
2.15 How much tip should be given based on the quality of the food and
service using fuzzy inference . . . . . . . . . . . . . . . . . . . . . . 28
2.16 Two frames from an animation that uses a shimmering effect to indi-
cate uncertainty by oscillating luminosity in regions of high uncertainty 30
xix
2.17 A visualization that draws the probability density function over asso-
ciated data points . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1 Visualizations of Employment Numbers in California. Years 2005-
2010 are predicted. (a) Assuming Average Growth (b) Indicating Growth
is Estimated (c) Possible Growth (d) Likely Growth. (Data Source:
California Employment Development Department) . . . . . . . . . . 37
3.2 Description of the framework . . . . . . . . . . . . . . . . . . . . . . 40
3.3 Additional Layer in Spreadsheet Hierarchy . . . . . . . . . . . . . . 41
3.4 Screenshot of the Prototype System . . . . . . . . . . . . . . . . . . 44
4.1 Basic Cell Type Object Hierarchy . . . . . . . . . . . . . . . . . . . 48
4.2 Novel CellType Object Hierarchy . . . . . . . . . . . . . . . . . . . 48
4.3 Screen-shot of the Prototype . . . . . . . . . . . . . . . . . . . . . . 51
4.4 Visualization Sheet for the Graph in Figure 4.7 . . . . . . . . . . . . 51
4.5 Process for Constructing an Uncertainty Spreadsheet . . . . . . . . . 53
4.6 Interval Modeling Example: (a) Original Model (b) Traditional Spread-
sheet (c) Prototype System Uncertainty Hidden (d) Prototype System
Uncertainty Shown . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.7 Using An Interval (±0.5) for Annual Change in Interest Rates Propa-
gates the Uncertainty to NPV . . . . . . . . . . . . . . . . . . . . . . 58
4.8 Volumetric Representation of the Most Likely Effect Interest Rate Changes
Will Have on NPV. . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.1 Progression Through States of Information Uncertainty (Boxes) as a
Result of Information (Arrows) . . . . . . . . . . . . . . . . . . . . . 63
5.2 Projection of Information Uncertainty onto an Estimate Point . . . . . 64
5.3 All Collapses of the Interval [4.5,5.5] . . . . . . . . . . . . . . . . . 65
5.4 All Collapses of a Fuzzy Number Around 5 . . . . . . . . . . . . . . 65
5.5 Collapses for a Fuzzy Number Around 5, Using a Cut Plane . . . . . 65
5.6 The Interval Data Structure . . . . . . . . . . . . . . . . . . . . . . . 71
5.7 Definition of Continuous Rough Set Using a Marker Sequence . . . . 72
5.8 Definition of Linearly Defined Fuzzy Set Using a Marker Sequence . 72
5.9 The information uncertainty modeling techniques sorted into three strata 75
5.10 Example of Increasing Levels of Uncertainty Information . . . . . . . 76
5.11 Sample Promotion/Demotion Graph . . . . . . . . . . . . . . . . . . 78
6.1 Graphs illustrating the Visual Treatment of Information with Variable
Degrees of Uncertainty under Different Objectives. . . . . . . . . . . 84
6.2 Schematic Illustration of the Dual Uncertainty Model . . . . . . . . . 90
xx
6.3 UML Diagram of the Dual Uncertainty Model . . . . . . . . . . . . . 92
6.4 Illustration of the Quad Uncertainty Model . . . . . . . . . . . . . . . 94
6.5 Example of a Recursive UUM . . . . . . . . . . . . . . . . . . . . . 95
6.6 Possible effects of interest rate movements on NPV (2D). . . . . . . 97
6.7 Possible effects of house price movements on NPV (2D). . . . . . . . 99
6.8 Possible effects of house price movements on NPV (3D). . . . . . . . 99
6.9 Possible effects of house prices and interest rates on NPV. . . . . . . . 99
6.10 Most likely profitability resulting from changes in interest rates . . . . 100
6.11 Volumetric representation of the most likely effect interest rate changes
will have on NPV. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.12 Likelihood of NPV . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.13 Effect of interest rate changes grouped into 5 year periods . . . . . . . 103
6.14 Optimum time to sell the property under different economic conditions. 106
6.15 Architecture of the case study system . . . . . . . . . . . . . . . . . 106
6.16 YAWL query: Prototype tool for the graphical business specification
reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.17 Production rules for reducing a YAWL graph . . . . . . . . . . . . . 108
6.18 Original graph prior to simplification . . . . . . . . . . . . . . . . . . 110
6.19 Reduced specification using collapse approach (α = 2.5) . . . . . . . 111
6.20 Reduced specification for text query “legal” using decimation approach
(β = 0.5, α = 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.1 Design of the User Interface . . . . . . . . . . . . . . . . . . . . . . 114
7.2 The Spreadsheet Architecture . . . . . . . . . . . . . . . . . . . . . . 115
7.3 High-level Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.4 View and Controller Classes for the Spreadsheet . . . . . . . . . . . . 117
7.5 The Core Components . . . . . . . . . . . . . . . . . . . . . . . . . 118
7.6 UML Inheritance Diagram for the Kernel Class . . . . . . . . . . . . 118
7.7 Main Classes in the Datamodel Component . . . . . . . . . . . . . . 119
7.8 Relationship of the Dependency Graph to Cells and CellContainers . . 120
7.9 Cell C is Dependent on A Multiple Times . . . . . . . . . . . . . . . 120
7.10 UML Inheritance Diagram for the DependencyGraph Class . . . . . . 121
7.11 Example Formula and its CodeTree . . . . . . . . . . . . . . . . . . 123
7.12 Formula Language Definition . . . . . . . . . . . . . . . . . . . . . . 123
7.13 UML Diagram for the CodeTree Class . . . . . . . . . . . . . . . . . 124
7.14 UML Inheritance Diagram for the Propagation Models . . . . . . . . 124
7.15 The IPropagationMethod Class . . . . . . . . . . . . . . . . . . . . . 124
7.16 The IPlugin Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.17 The ICellType Interface . . . . . . . . . . . . . . . . . . . . . . . . . 125
xxi
7.18 The IDUMMethod Interface . . . . . . . . . . . . . . . . . . . . . . 126
7.19 The UncertaintyRange and UncertaintyRangeSet Classes . . . . . . . 126
7.20 UML Inheritance Diagram for the IVisualElement Interface . . . . . . 127
8.1 Hierarchical Spreadsheet Prototype. The Parent Sheet (Left) Contains
the Child Sheet (Right) . . . . . . . . . . . . . . . . . . . . . . . . . 132
8.2 Floating Observer, Observing the Uncertainty Line Graph in Cell D14 134
8.3 Dependency Tree for the Floating Observer from Figure 8.2 . . . . . 135
8.4 CellType List Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
8.5 Propagation Model Editor . . . . . . . . . . . . . . . . . . . . . . . . 137
8.6 Propagation Model, Method, and Model Set . . . . . . . . . . . . . . 138
8.7 Propagation Model Set Editor . . . . . . . . . . . . . . . . . . . . . 139
8.8 Dual Uncertainty Model Selector . . . . . . . . . . . . . . . . . . . . 140
8.9 Prototype system for GPGPU visualization . . . . . . . . . . . . . . 144
9.1 Spreadsheet for First Experiment, Without Uncertainty . . . . . . . . 149
9.2 Spreadsheet for First Experiment, With Uncertainty . . . . . . . . . . 149
9.3 Spreadsheet for First Experiment, Analytical Approach Using Tradi-
tional Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
9.4 Monte-Carlo Spreadsheet for First Experiment . . . . . . . . . . . . . 151
9.5 Construction Cost Graph . . . . . . . . . . . . . . . . . . . . . . . . 155
9.6 Number of Formulae in Construction Experiments . . . . . . . . . . . 155
9.7 Formula and Layout Changes During Retrospection Experiments . . . 156
9.8 Background of Respondents . . . . . . . . . . . . . . . . . . . . . . 160
9.9 Average Completion Time . . . . . . . . . . . . . . . . . . . . . . . 161
9.10 Questions and Average Responses . . . . . . . . . . . . . . . . . . . 162
9.11 Hierarchical Business Plan Spreadsheet . . . . . . . . . . . . . . . . 163
9.12 Market Share Overview with Embedded Sheets . . . . . . . . . . . . 164
9.13 Market Share for Year 1 . . . . . . . . . . . . . . . . . . . . . . . . . 165
9.14 Target Market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
9.15 Typical Break-even Visualization . . . . . . . . . . . . . . . . . . . . 166
9.16 Probabilistic Break-even Visualization . . . . . . . . . . . . . . . . . 167
xxii
List of Tables
2.1 Sources and Causes of Information Uncertainty . . . . . . . . . . . . 10
2.2 Data Stages in the Data State Model . . . . . . . . . . . . . . . . . . 22
2.3 Transformation Operators in the Data State Model . . . . . . . . . . . 23
4.1 Format of Cells in the Prototype System . . . . . . . . . . . . . . . . 49
4.2 Prototype Uncertainty Interrogation Functions . . . . . . . . . . . . . 52
5.1 Predicted Growth Rates used in Figure 3.1 . . . . . . . . . . . . . . . 62
5.2 Categories of Information Uncertainty Modeling Techniques . . . . . 67
5.3 Common Information Uncertainty Modeling Types . . . . . . . . . . 70
6.1 Information Uncertainty Visualization Objectives . . . . . . . . . . . 83
6.2 Questions Used to Elicit the User-Objective . . . . . . . . . . . . . . 86
8.1 Comparison of IntervalCell and SpreadsheetCell . . . . . . . . . . . . 132
8.2 Examples from the Prototype Addressing Scheme . . . . . . . . . . . 132
8.3 Novel Cell Types for GPGPU . . . . . . . . . . . . . . . . . . . . . . 142
8.4 Novel Functions for GPGPU . . . . . . . . . . . . . . . . . . . . . . 143
9.1 A Selection of Normal Probability Functions in Microsoft Excel 2003 146
9.2 Actions of the User . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
9.3 Retrospection Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
9.4 Respondents to the Survey . . . . . . . . . . . . . . . . . . . . . . . 157
xxiii
xxiv
Statement of Original Authorship
The work contained in this thesis has not been previously submitted for a
degree or diploma at any other higher education institution. To the best of
my knowledge and belief, the thesis contains no material previously pub-
lished or written by another person except where due reference is made.
Signature:Alexander Streit
Date:
xxv
xxvi
CHAPTER 1
Introduction
1.1 MotivationThe term information uncertainty refers to vagueness, imprecision, fuzziness, likeli-
hood, and related uncertainty as it is present in information. Many problems are subject
to information uncertainty and, in response, numerous techniques have been developed
to model this uncertainty. Modeling information uncertainty not only provides greater
confidence in results, but can also give an indication of how much confidence to place
in the results. While visualization is a popular tool, information uncertainty visualiza-
tion is far less widespread.
In this work we have identified four major barriers to the uptake of information
uncertainty modeling and visualization. Firstly, there are numerous information uncer-
tainty modeling techniques, each of which are treated differently. This forces users to
anticipate their information uncertainty needs before building their data model. Sec-
ondly, parameters of the uncertainty space tend to be treated at the same level as vari-
ables, which makes it easier to introduce avoidable errors and causes the information
uncertainty modeling technique to dictate the structure of the user’s model. Thirdly,
propagation of uncertainty information must be manually managed by the user, which
requires expertise, is error prone, and can be tedious. Fourthly, uncertainty visualiza-
tion techniques tend to be developed for particular information uncertainty types and
they are largely incompatible with other forms of uncertainty information. This nar-
rows the selection of visualization techniques available and results in a tendency for ad
hoc information uncertainty visualization techniques.
Information uncertainty modeling makes it more difficult to manage the data model
2 Chapter 1. Introduction
due to increased information. Furthermore, it is common that a chosen uncertainty
modeling technique will subseqently need to be changed, since knowledge about the
uncertainty changes as more information becomes available. This is currently a diffi-
cult and error prone process.
Visualization of information uncertainty poses its own unique challenges. Existing
visualization techniques may not be appropriate for uncertianty information and there
are issues with information overloading and interpretatability of results. On a practical
level, there is a lack of tools that are conducive to visualizing information uncertainty.
To ease the burden of managing the information overload in modeling and visual-
ization requires an integrated system that covers the entire workflow cycle from data
acquisition to visualization. Tools are also needed to help users with higher-level tasks
such as selection of modeling and propagation options, and organization and compar-
ison of visual mappings. More specifically, the architecture should support automated
uncertainty propagation and allow easy switching between different uncertainty mod-
els, and different methods of display.
Spreadsheets are often used to perform uncertainty based analysis and they have
previously been shown to be well suited as an approach to visualization. However, the
benefit of a spreadsheet approach to uncertainty modeling and visualization has not yet
been explored. This thesis extends the spreadsheet paradigm to support information
uncertainty modeling and visualization in an integrated whole.
1.2 AimsThe overall aim for this thesis is to devise an integrated information uncertainty mod-
eling and visualization environment that has the following features:
Hierarchical structure: The system should differentiate between levels of detail in
the data model. Uncertainty information is of a lower level of detail than the
variables.
Reduce data type lock-in: If a data model is constructed using particular informa-
tion uncertainty modeling techniques, the cost to change to another modeling
technique should be minimized.
Adaptive: Information about the uncertainty space of a variable should be easy to add,
change, or remove at any stage of the modeling and visualization process.
Seamless integration of information and its uncertainty: There should not be an ar-
tificial separation between the information and its uncertainty.
1.2 Aims 3
Simplify information uncertainty modeling: Users should not be required to have
an intimate understanding of the modeling technique mechanics in order to use
it.
Automate propagation: Uncertainty information needs to be propagated and the sys-
tem should carry this out automatically.
Less error prone: The system should reduce the potential for user induced errors.
Flexible: Users should be able to map uncertainty information into alternative models
and visual features so that they can explore the impacts of different modeling
and visualization techniques.
Robust: When the uncertainty information changes, the existing data model and vi-
sualizations should continue to function correctly.
Extensibile: There are numerous information uncertainty modeling techniques and
the design of the system should allow for more to be added.
In order to achieve this aim, the following tasks are performed:
• Examine the field to determine the current state of play, covering information
uncertainty modeling techniques, visualization processes and practices, and un-
certainty visualization;
• Design an integrated information uncertainty modeling and visualization frame-
work
• Investigate how the spreadsheet paradigm can be extended to intrinsically sup-
port information uncertainty modeling and visualization;
• Explore uncertainty encapsulation as an approach to semantic association of in-
formation and its uncertainty;
• Develop an automated propagation mechanism and a method for resolving un-
usual modeling technique combinations;
• Design uncertainty abstractions that enable visualization mappings to be data
type independent;
• Explore the user-objectives approach as a means for defining visualization char-
acteristics;
• Conduct a quantitative analysis comparing the cost of our approach to existing
methods;
4 Chapter 1. Introduction
• Analyze feedback from potential users;
• Conduct case studies on financial decision support and business planning to es-
tablish the viability of the spreadsheet for commercial uses;
• Investigate the capability of the architecture to be applied to non-uncertainty uses
through a case study; and
• Draw conclusions and make recommendations for future work.
1.3 ScopeThis thesis deals with information uncertainty, which is uncertainty about the true value
of a unit of information. The intrinsic connection between uncertainty and information
is the basis for our encapsulation approach, which underpins the automatic propagation
and visualization-oriented uncertainty abstraction. However, there exist several other
forms of uncertainty, such as uncertainty arising from interpretation, for which the
encapsulation approach may not be suitable. The methods presented in this thesis is
limited to those forms of uncertainty that can be parametrized in some quantifiable
way.
Modeling of uncertainty has its foundation in mathematics. This project is con-
cerned with the frameworks, approaches, and methods for applying these modeling
techniques. As such, mathematical issues will be touched on, however, detailed cov-
erage of mathematical models is beyond the scope of this work and it is assumed that
users will use the mathematical techniques appropriate to their problem.
1.4 Original ContributionCurrent investigations into information uncertainty visualization have focused on vi-
sualization techniques for particular information uncertainty data types. We approach
the problem of information uncertainty visualization holistically, from modeling and
automated propagation through to user-objectives in visualization.
We produce an integrated information uncertainty modeling and visualization frame-
work and design the information uncertainty visualization spreadsheet, which intrin-
sically support information uncertainty modeling, automated uncertainty propagation,
and uncertainty model abstracted visualization.
To achieve this we extend the spreadsheet paradigm to incorporate information un-
certainty and visualization features. This requires a number of components. Firstly,
our encapsulation of uncertainty information approach semantically links the infor-
mation to its uncertainty. Secondly, we introduce the uncertainty propagation model
1.5 Significance 5
to manage the mechanics of propagating uncertainty, including operations involving
mixed data type parameters. Thirdly, we present hierarchical heterogeneous propaga-
tion, which automatically determines suitable combinations of the available methods
to ensure that the propagation can be achieved. Fourthly, we produce uncertainty ab-
straction models, which abstract the uncertainty information for visual mapping in vi-
sualizations by providing a common plural value type. Fifthly, we incorporate flexible
visualization capabilities into the spreadsheet using a visualization sheet.
Abstraction from the information uncertainty data type means that traditional data
type specific visual mapping criteria may no longer be applicable, leaving a gap in the
knowledge. To address this, we investigate user-objectives for information uncertainty
visualization, which describe the characteristics of uncertainty space that the user is
seeking to visualize. User-objectives provide a data type abstracted means of describ-
ing, executing, and evaluating visualizations.
1.5 SignificanceThe significance of this work is that it provides the means for intuitive and non-
intrusive environment for modeling and visualizing information uncertainty. This has
three major effects. Firstly, access to information uncertainty visualization is designed
into the system from the outset and it does not require user expertise in uncertainty
techniques to manage information uncertainty. Secondly, uncertainty information is
easily added, changed, or removed at any stage of the process. Thirdly, information
uncertainty visualizations can be built independently of the modeling technique, pro-
viding a coherent foundation for the development of visualization techniques while
reducing their tendency to be ad hoc.
Information uncertainty is a problem in many fields. Overcoming barriers to its
modeling and visualization is an important step in managing a difficult problem.
1.6 Organization of the ThesisThe organization of this thesis is as follows. Chapter 2 introduces background mate-
rial on uncertainty modeling techniques, visualization techniques, and what has been
done to visualize uncertainty. Chapter 3 describes the framework that integrates infor-
mation uncertainty modeling and visualization tasks together into a coherent whole.
Chapters 4 through 6 cover the components of the framework: Chapter 4 elaborates
on the spreadsheet paradigm as a mechanism for integrating and managing these tasks.
Chapter 5 investigates the encapsulation approach to information uncertainty, which
includes the unified hierarchy and automated propagation. Chapter 6 explores the ab-
straction approach, which includes uncertainty abstraction models and user-objectives
6 Chapter 1. Introduction
for visualization. Chapter 7 integrates the components into a core system, covering
the requirements, design, and architecture. Chapter 8 considers advanced features and
extensibility of the system. Chapter 9 presents the evaluations of the system, with a
comparative analysis of different approaches, a discussion of a survey, and a case study
in business planning. Chapter 10 provides a conclusion and points to future work.
CHAPTER 2
Background
“As far as the laws of mathematics refer to reality, they are not certain;
and as far as they are certain, they do not refer to reality.”
– Albert Einstein1
2.1 IntroductionInformation uncertainty is a complex subject that is inherent in many real-world prob-
lems. The uncertainty comes from different sources and can be interpreted and mod-
eled in various ways. There are often subtle interactions between variables and uncer-
tainty, which can be difficult to understand. Visualization of information uncertainty
presents an opportunity to provide deeper insights into the nature of the information,
its uncertainty, and the impact it has on outcomes. However, the difficulty of adopt-
ing information uncertainty and the lack of visualization tool support has caused many
practitioners to ignore the uncertainty completely or to ignore situations where the
uncertainty is deemed too high. This practice results in valuable knowledge being dis-
carded and reduced quality in outcomes, or worse, can even result in entirely wrong
outcomes.
There are aspects of information uncertainty that have been given considerable at-
tention, particularly in the field of mathematics. Two aspects have been especially
well developed: the first includes the various mathematical models that exist for repre-
senting, measuring, and recording uncertainty. The second aspect is the collection of
1In J. R. Newman (ed.) The World of Mathematics, New York: Simon and Schuster, 1956
8 Chapter 2. Background
rules and techniques for propagating, estimating, and minimizing information uncer-
tainty. These models and techniques range from the statistical methods and probabil-
ities through to fuzzy models. Research into visualization of information uncertainty
has only been carried out sporadically during the last decade. Earlier work has focused
on a data-driven approach, with visual data representations for particular data types or
responding to the needs of specific applications. More recent work has investigated
task-based approaches and sought to integrate higher-level issues, such as software
architectures and frameworks for visualization systems.
The aim of this chapter is to provide background for understanding infomation un-
certainty modeling and visualization by examining relevant works and identifying key
issues. The chapter is organized as follows. Section 2.2 describes information uncer-
tainty in general, covering sources of information uncertainty, understanding informa-
tion uncertainty and its usage, and information uncertainty modeling techniques. Sec-
tion 2.3 discusses relevant issues in visualization, focusing on the process and sense-
making cycle, and visualization techniques. Section 2.4 examines current progress and
key techniques in information uncertainty visualization. A summary of this chapter is
given in Section 2.5.
2.2 Information UncertaintyIn many circumstances the true value of a variable is not fully known, giving rise
to information uncertainty. The information that is known about the variable can be
stored and this technique is referred to as information uncertainty modeling. As an
example, information uncertainty modeling can be used to aid analysis of the potential
environmental impact of a new road. Data is required about the type, amount, and
distribution of vegetation; the variety, location, and habits of local animals; and how
all of these interact. The data that is collected will only be accurate to a certain level
of precision, which can be modeled. Further, much of the information derived from
expert knowledge will be qualitative in nature and thus dependent on interpretation.
It is already a significant task to understand the structure, characteristics, trends,
and interdependency of data. However, information uncertainty serves to complicate
things even further as it requires an understanding of the propagation of uncertainty,
the potential for variation in outcomes, and impacts due to changes in the level of
information uncertainty. Effective visualization of the information and its uncertainty
can help to overcome this problem.
Historically, uncertainty had been regarded as an undesirable factor that is to be
avoided. Only in the 20th century has it become a fundamental component of sci-
ence [62]. However, the term uncertainty itself can vary depending on the author and
the field. For example, Hunter and Goodchild, dealing with spatial databases, reserve
2.2 Information Uncertainty 9
the term uncertainty to refer exclusively to unknown inaccuracy and instead use the
term error for objectively known inaccuracy [47]. Pang et al. use the term uncer-
tainty to cover three categories [86]: statistical, including probabilistic and confidence
methods; error, which refers to differences between estimates and actual values; and
range, which covers intervals of possible values. Klir [58, 60] and Gershon [33] offer
a more general definition of uncertainty as some deficiency in information, and from
there define a measure of information in terms of reduction in uncertainty. Standards
and guidelines have been developed for the management of uncertainty in measure-
ment. One such guide by the National Institute of Standards and Technology (NIST)
describes measurements as approximations and contends that “the result is complete
only when accompanied by a quantitative statement of its uncertainty” [112, pp. 1]. A
similar guide that was issued for analytical chemistry by EURACHEM defines mea-
surement uncertainty as a parameter “that characterizes the dispersion of the values
that could reasonably be attributed to the measurand” [29, pp. 4]. The common theme
is that uncertainty can be characterized for a particular unit of information, and we use
the term information uncertainty to refer to situations where this condition holds.
2.2.1 Sources of Information UncertaintyPang et al. [86] investigated uncertainty visualization and categorized sources of in-
formation uncertainty based on the point of the visualization process in which it is
introduced. The resulting three categories are acquisition, where information uncer-
tainty is introduced from the measurements and models; transformation, introduced
during the information processing step for visualization; and visualization, referring to
the uncertainty introduced through the act of the visualization itself. These categories
are helpful in characterizing the introduced uncertainty for visualization, but lack gran-
ularity in describing the reason for the uncertainty. Thomson et al. [114] focused on
the tasks of information analysts in the field and used their descriptive terms to derive
a categorization for uncertainty in geospatially referenced information.
Information uncertainty can arise due to a number of reasons. Whenever predic-
tions are made, they are uncertain. Errors and imprecision in measurement are another
common source. The Eurochem guide lists eleven sources of measurement uncer-
tainty [29], but is careful to point out that these may not necessarily be independent.
While their list includes “operator effects” to cover human introduced uncertainty, the
sources are mostly concerned with acts of measurement. Pham and Brown [91] pro-
vide a categorization of uncertainty into three categories: factual, pseudo-measurement
and pseudo-numerical, and perceptual-based. Factual information is numerical and
measurement-based. Pseudo-measurement and pseudo-numerical information are nu-
meric approximations. Perceptual-based information is typically linguistic, but can
10 Chapter 2. Background
also be image- or sound-based. Table 2.1 lists typical sources of information un-
certainty and examples of causes (from [92]). Earlier work by Reznik and Pham
matched nine similar categories of uncertainty sources to uncertainty modeling tech-
niques [101].
Sources of information uncertainty Causes
Limited accuracyLimitation in measuring instruments,
or computational processes, orstandards.
Missing dataPhysical limitation of experiments;
limited sample size ornon-representative sample.
Incomplete definitionImpossibility or difficulty inarticulating exact functional
relationships or rules.
InconsistencyConflicts arisen from multiple sources
or models.Imperfect realisation of a definition Physical or conceptual limitation.
Inadequate knowledge about theeffects of the change in environment
Model does not cover all influencefactors; or is made under slightly
different conditions; or is based on theviews of different experts.
Personal bias Differences in individual perception
Ambiguity in linguistic descriptionsA word may have many meanings; or
a state may be described by manywords.
Approximation or assumptionsembedded in model design methods or
procedures
Requirements or limitations of modelsor methods.
Table 2.1: Sources and Causes of Information Uncertainty
2.2.2 Understanding Information UncertaintyThe search for truth is a goal of science and the presence of uncertainty can imply
a deficiency in our understanding. This explains why throughout most of recorded
history scientific thought has sought to avoid uncertainty2. However, attitudes toward
uncertainty have begun to shift, partly due to discoveries such as the Heisenberg un-
certainty principle. Today, uncertainty is viewed as an intrinsic property of problems
in most fields. For example, Couclelis noted that considerable effort had been devoted
to fighting uncertainty in Geographic Information Systems (GIS), but that there are
many things that cannot be known, and the inability to know was not due to human
limitation [19].2The interested reader is directed to Appendix A of [2] for history of perspectives on knowledge.
2.2 Information Uncertainty 11
Many disciplines use information uncertainty modeling techniques to manage un-
certainty. The incorporation of information uncertainty techniques enables practition-
ers to describe and quantify the uncertainty space. Uncertainty information at the
inputs can then be propagated through the model to the outputs. The output can now
provide additional information. For example, how much confidence we should place in
the result, what alternatives the result may have, and others depending on the modeling
technique and the inputs.
A recent use for information uncertainty techniques is to simplify systems by re-
moving less important information. This mirrors human reasoning, where we reserve
detail for items of interest. For example, when ascertaining whether to jump out of the
way of a moving vehicle, a rough estimate of the vehicle’s velocity is usually sufficient
to determine the appropriate action [77] and precise knowledge of the actual velocity
is usually not necessary.
There are two main approaches to information uncertainty modeling and propaga-
tion. The first approach is to use analytical techniques, which require an understanding
of mathematical principles involved. The second approach is to use numerical tech-
niques, such as Monte-Carlo simulation. Sometimes numerical techniques are used
implicitly without the user realizing, usually by manually varying the inputs and ob-
serving their effects.
Uncertainty is so intrinsic to information that Klir [62, 63, 58, 61, 59, 60] has been
working on a generalized information theory, which has the aims of incorporating
uncertainty and information into a unified theory. Their approach is to conceive of
uncertainty-based information as being the result of a reduction in uncertainty. As a
result of some action, the a priori uncertainty U1 becomes the a posteriori uncertainty
U2 and the information derived from this action is therefore given by U1−U2 [59].
Using information uncertainty modeling techniques not only provides greater con-
fidence in results, but can also give an indication of how much confidence to place in
the result.
2.2.3 Approaches to Modeling Information UncertaintyThere are numerous uncertainty modeling techniques and we will describe several of
the major ones here. Since the sources and causes of uncertainty are different, various
mathematical models have been developed to faithfully represent different types of
information. We summarise these models into four common types:
Probability denotes the likelihood of an event to occur or a positive match. Prob-
ability theory provides the foundation for statistical inference (e.g. Bayesian
methods) [1, 5].
12 Chapter 2. Background
Possibility provides alternative matches, e.g. a range of errors in measurement [2].
Provability is a measure of the ability to express a situation where the probability of
a positive match is exactly one. Provability is the central theme of techniques
such as Dempster-Shafer calculus.
Membership denotes the degree of match and allows for partially positive matches,
e.g. fuzzy sets [3, 6], rough sets [4].
Probability theory models uncertainty in terms of anticipation: the expectation that an
outcome will eventuate is characterized by a probability.
Classical probability theory describes the ratio between favorable and indifferent
outcomes, which has several shortcomings. This led to the development of frequentest
probability theory, which defines the chance of a given result under random conditions.
Thus, in a repeated experiment the probability of an event will tend toward the ratio
between the number of times it occurs to the number of times the experiment was run:
Pr(x) =x
x+ x
Where Pr : X→ [0,1] is the probability function, x∈X is the event, and x = X−{x}is all other outcomes. A probability distribution completely describes the expected
outcomes of a random variable. For real-valued random variables, the probability dis-
tribution can be defined by
F(x) = ∑xi≤x
Pr(xi)
for discrete probabilities and
F(x) =x∫
−∞
f (t)dt
for continuous probabilities, where f is the probability density function. A proba-
bility density function (PDF) is effectively a histogram of expected outcomes, with a
scale such that the integral is unity,∫
f (t)dt = 1.
Probability distributions can take any form, however several well studied distri-
butions exist. Of these, the two most commonly used distributions are the uniform
distribution and the normal distribution. The uniform distribution assigns every out-
come an equal probability. The normal distribution3 has a PDF of
F(x) =1
σ√
2πe−
(x−μ)2
2σ2
3also known as the Gaussian distribution after Gauss
2.2 Information Uncertainty 13
where μ is the mean and σ is the standard deviation. Normal distributions find
common use because the sum of many independent random variables will approximate
a normal distribution.
Monte-Carlo simulation is a numerical approach to uncertainty that uses probabil-
ity distributions [76, 3]. Input variables are assigned a probability distribution, com-
monly a uniform or normal distribution. Numerous random instances are chosen for
the inputs according to these distributions, and the outputs that are calculated can then
be used to characterize the outputs of the system.
The frequentest view of probability models an expectation based purely on fre-
quency of events. An alternative view is Bayesian probability theory, which has found
widespread use in fields such as machine learning and computer vision [90], and
econometrics [35]. The mathematician Thomas Bayes introduced a theorem that was
generalized by Laplace but is still referred to as Bayes’ Theorem. The theorem relates
the conditional and marginal probability of events of two random variables, x and y:
Pr(x|y) =Pr(y|x)Pr(x)
Pr(y)
Bayes’ theorem enabled a new philosophical view of probabilities as modeling
belief, using what is called Bayesian inference. Thus, our expectation of an event can
be revised: Pr(x|y) is the posterior probability, our revised expectation of event x given
evidence y; Pr(y|x) is the conditional probability of seeing y given the hypothesis that
x is true; Pr(x) is the prior probability of x; and Pr(y) is the marginal probability of y,
whether or not x is true.
Probabilities, whether frequentest or Bayesian, are not the only means of describ-
ing uncertainty. Classical sets are a means of defining uncertainty. For example, the
diagnosis made by a medical doctor could be that a patient suffers from the flu or the
cold. In this situation there is a possibility that either (or both) be true and there is un-
certainty about which it is. Lotfi Zadeh, in his seminal 1965 paper, proposed fuzzy sets,
which along with fuzzy logic enable human-like reasoning using partial truth4. A fuzzy
set is a set where each member can be assigned a truth value. This can be expressed as a
fuzzy membership function:μ : X→ [0,1], where 0 indicates not a member, 1 indicates
definitely a member, and all numbers in between indicate a partial membership.
A good example of fuzzy sets is given by Mendel in [77]. College students were
asked to rank words such as “somewhat” and “quite a bit” against a numerical scale of
quantity. Although individual answers varied significantly, a clear ordering emerged
and Mendel was able to produce a mapping between these words and their indication of
4Zadeh was not the first to investigate partial truth and interested readers are directed to the works ofŁukasiewicz [71]
14 Chapter 2. Background
quantity. From there fuzzy sets can be constructed, such as the set lots, which capture
the degree to which numeric values can be representations for each of the words.
For example, assuming that 24◦C is normal room temperature and 30◦C is com-
pletely hot, temperatures between 24◦C and 30◦C are partially hot. The set of hot
temperatures might therefore be given by the following membership function, illus-
trated graphically in Figure 2.1:
μHot(x) =
⎧⎪⎨⎪⎩
1 x > 30x−2430−24 24≤ x≤ 30
0 x < 24
20 22 24 26 28 30 32
1
0
Membership
Temperature
Hot
Figure 2.1: Fuzzy Set for Hot
Fuzzy logic is the inference counterpart for fuzzy sets. A complete fuzzy logic sys-
tem consists of a fuzzifier, inference engine, and defuzzifier [77]. The fuzzifier maps
numeric values to partial memberships of fuzzy sets. The inference engine executes
operations and rules on fuzzified information. The defuzzifier converts a fuzzy repre-
sentation into a bi-valued representation, typically using an α-cut.
The graph in Figure 2.2 shows an example fuzzifier that maps room temperatures to
the fuzzy sets Cold, Normal, and Hot. A temperature of 25.5◦C will be wholly outside
the set Cold, mostly within the set Normal, and to a lesser extent partially within the
set Hot.
Methods are defined for several operations including set fuzzy AND (intersection),
OR (union), and NOT (complement)5. Traditional fuzzy logic, also called Zadeh fuzzy
logic after its inventor, is shown graphically in Figure 2.3. Given two fuzzy variables,
a and b:
a∪b = aORb = min(μ(a),μ(b))a∩b = aANDb = max(μ(a),μ(b))¬a = NOT a = 1−μ(a)
5Implementations for operators can vary depending on application and interested reader is directedto [77]
2.2 Information Uncertainty 15
1
0
Cold Normal Hot
16 20 24 28 32
Temperature
Figure 2.2: Fuzzification for Temperature
Set A Set B
A AND BA OR B
NOT A
1
0
1
0
1
0
1
0
1
0
Figure 2.3: Results of Fuzzy Operations are Shown by the Grey Shaded Regions
Thus rules can be established using constructs such as D = A∧B∨¬C, where A,
B, C, and D are fuzzy sets.
The defuzzifier typically uses an α-cut, which is a mechanism to translate the fuzzy
output into traditional bi-valued truth, most typically:
μ ′(d) =
⎧⎨⎩1 μ(d)≥ α
0 otherwise
where t ∈ [0,1] and α is called the “α-cut plane” and is in the range [0,1]. An example
of defuzzification with an α-cut of 0.5 is given graphically in Figure 2.4. As the graph
shows, the set of normal temperatures is mapped to the interval [21.5,26.5].
16 Chapter 2. Background
1
0
N o r m a l
1 6 2 0 2 4 2 8 3 2
T e m p e r a t u r e ( º C )
a l p h a c u t = 0 . 5
Figure 2.4: Defuzzification Using an α-cut
A popular representation for uncertainty is the rough set [96]. Rough sets extend
classical sets to allow an element to be both inside and outside the set. Thus there are
three modes: inside, outside, and both. Three operations are defined that translate to
a classical set: the upper limit, the lower limit, and the boundary. The upper limit in-
cludes all items that are wholly inside, or both inside and out. The lower limit includes
only those items that are inside the set. The boundary includes only items that are both
inside and outside the set. This is illustrated graphically in Figure 2.5. An example
application of rough sets is in classifying customer details: the rough set information
provided will contain a customer if all information has been provided, not contain a
customer if no information is provided, and be in both states if some information is
provided but some is missing. The company can send letters requesting information to
¬LOWER(c) where c is the “information provided” rough set.
Boundary Lower
Figure 2.5: Example Rough Set for Containment of a Region
2.3 Visualization 17
Another common classical set-based uncertainty modeling technique is the inter-
val. Intervals define the upper and lower boundaries on a continuum, most commonly
R. The boundaries themselves can be inclusive, which is indicated using square brack-
ets; or exclusive, indicated using rounded brackets. Thus, [0,1) includes zero but ex-
cludes one. Interval arithmetic defines the propagation of uncertainty under common
arithmetic operators. For example, addition is defined as:
[a,b]+ [c,d] = [a+ c,b+d]
2.3 Visualization2.3.1 The Sensemaking ProcessThe user has a visualization objective. To achieve this objective they will use a visual-
ization technique. The technique transforms information and displays it according to
the parameters of the technique. To reach their objective, users will typically iteratively
adjust the parameters and view the results, repeating as often as necessary. There are
three general classes of user objectives, which can also be described as visualization
phases [56]:
1. Exploration, searching the data for relations and patterns
2. Analysis, exploring known relations
3. Presentation, preparing the visualization to communicate information to others
Visualization requires an iteration of choose− inspect− view−ad just, which can be
cumbersome, particularly for novice visualization users. Several studies have there-
fore considered improving the user experience while going through the visualization
process [21, 49, 93, 11].
One study sought to encode the visualization exploration process using an XML-
based language [49]. The encoding captures the parameters used for each iteration of
the loop. By defining a parameter derivation calculus, the results of several visualiza-
tion sessions can then be visualized. Such visualizations of visualization sessions are
designed to aid the user in understanding the progression of their use of the system.
Although the work seeks to formalize the visualization process, it does not improve the
process and is limited to modulating parameters of a particular visualization technique.
A significant drawback of the work in [49] is that the ability to change to another type
of representation or selection of alternate data are not included in the model.
Visualization is a tool and not an objective as of itself. However, it has been ob-
served (e.g. [73]) that some visualizations are good for publications, tending to be
18 Chapter 2. Background
colorful and showy images, but not informative or applicable to real-world problem-
solving. Ma [73] argues that scientists need to be involved in evaluating the effec-
tiveness of visualization methods, and suggests working with users from application
domains both to devise the requirements of the visualization and to subsequently eval-
uate the techniques through case studies.
Other suggestions for overcoming these obstacles is for visualization to be task-
driven instead of data-driven [93, 11]. One approach to this is through an agent-based
framework [93] (see Figure 2.6), where a profile agent observes the user’s choice of
visualizations and adjusts the systems behavior to improve workflow.
Figure 2.6: Four agents work together in the Visualization Support System
Another proposal is the “Visualization Task Network (VTN)” [11] (see Figure 2.7,
from [11, pp.603]), which can learn the requirements of the user. A VTN is a task-
oriented approach, where the user first selects the task to be achieved. For each chosen
task a set of techniques are proposed by the system. Once a technique is chosen, a
list of attributes (e.g. glyph information, grid spacing, and color) is presented. These
parameters are similar to those in the work of Jankun-Kelly et al. [49]. Each time the
user selects a {task, technique,attribute} set for visualization, the system can increase
the weight of that combination. When the user selects a task, the techniques with the
highest weighting are shown first. Similarly, once a task and technique are chosen, the
attributes with the highest weighting are shown first.
One approach to mapping visual features to visualization techniques takes an objective-
oriented viewpoint, and is derived from the visualisation data ontology outlined in [11].
The mapping begins with the choice of data attributes to be represented: relationships,
resemblances, order and proportion [34]. These attributes are then mapped to visual
features depending on the visualisation task to be performed. The visualisation task
is chosen to enhance the perception of the information required by the viewer for
their specific objective in performing the data analysis. The knowledge required for
2.3 Visualization 19
such a task-oriented approach is encapsulated in an agent-based visualisation architec-
ture [11].
Figure 2.7: The Visualization Task Network (VTN) Learns Task-oriented VisualizationParameters
A workshop that was held [27] established the visualization ontology represented
in Figure 2.8 (adapted from [27]). Development of the ontology involved investigation
of visualization from multiple perspectives. The result produces a clear anatomy of
visualization, except that it is missing one vital part: the role of the user. The user
plays an integral role in the visualization process, driving parameters and the visual-
ization tasks. By excluding the user from the ontology the authors have neglected to
consider not only usability and cultural issues, but also opportunities such as adaptive
visualization systems.
is aboutuses
Representation Data
Visualisation
Task Transformationsupported-by
input to
output from
Visual Haptic
Isosurface
Technique
is realised through
Key
A B A is-a B
A Bp
A
property p betweenconcepts A and B
Elided hierarchy
Figure 2.8: An Ontology of Visualization
20 Chapter 2. Background
2.3.2 Visualization TechniquesThe topic of visualization is traditionally introduced through reference to a taxonomy
of visualization techniques [41, 99, 15, 16]. A valuable reference text for visualization
techniques is given by Senay and Ignatius [106]. SIGGRAPH’s visualization education
program [41] introduces visualization techniques through a data type based classifica-
tion, which is reproduced in Figure 2.9. Examples of selected techniques are given in
Figure 2.10. The limitation of this classification is that it deals only with continuous
ordinal values. Visualizations for other types of data, such as trees, are not included.
Figure 2.9: Visualization techniques categorized by the type of data to be visualized[41]
Shneiderman [108] recognized the lack of trees and network graphs and addressed
this by including non-ordinal types in their taxonomy. However, the taxonomy itself
2.3 Visualization 21
(a) 2D Line based Contouring (b) 2D Histogram (c) 3D Streamlines
Figure 2.10: Selected examples of visualization techniques
continues to be based on the data type being visualized. The data types identified are
[108, pp. 337-339]:
• 1-Dimensional, such as textual documents, program source code, and alphabeti-
cal lists of names.
• 2-Dimensional, such as geographic maps, floor plans, and newspaper layouts.
• 3-Dimensional, real world objects such as molecules, the human body, and build-
ings.
• Temporal, such as medical records, project management, or hierarchical presen-
tations to create a data type that is separate from 1-dimensional data.
• Multi-dimensional, such as records in relational databases.
• Trees, which are hierarchies where each item, except the root item, has a link to
its parent.
• Networks, which represent relations that cannot be captured as trees.
OLIVE [99] is an online catalog of visualization systems categorized according to
this taxonomy, although at the time of writing it is only current up to 1997. While
Shneiderman’s taxonomy covers a wider range of visualizations, not all visualization
systems fit conveniently. For example, visualizations that present temporally ordered
3-Dimensional data could fit into either the temporal- or the 3-Dimensional categories.
To overcome these inconsistencies Card and Mackinlay [15] offer a classification based
on additional factors that need to be considered during visualization. Their analysis of
visualization systems considers not only the type of data, but also the filtering functions
applied to them, the controlled (text) and automatic (glyph) processing techniques, the
viewing transformations, and the user interaction elements for every variable in the
visualization. Data types are classified as [15, pp. 92-93]:
22 Chapter 2. Background
• Nominal, meaning they are only equal or unequal to other values.
• Ordinal, meaning they obey a less-than relation.
• Quantitative, meaning it is possible to do arithmetic on them.
• Intrinsically spatial, which are the subset of quantitative types that represent spa-
tial points.
• Geographical, which are the subset of intrinsically spatial types that represent
geographic locations.
• A Set mapped to itself, which is the case in graphs and trees.
This taxonomy is cumbersome for purpose of categorizing visualization techniques.
Unlike the preceding taxonomies, there is no single category for a visualization tech-
nique. Instead, each variable used in the visualization is decomposed according to
twelve factors and presented in a matrix. The matrices of two techniques can be com-
pared to pinpoint the exact differences between them. The intent of the authors is not
only to describe the differences in visualization techniques, but also to suggest new
possibilities for visualization techniques [15, pp. 92].
Chi [16] provides a taxonomy to help implementers understand how to implement
visualization techniques. The proposed taxonomy is based on their earlier work on
the Data State Reference Model [18]. A visualization technique is broken down into
four stages according to the state of the data, as shown in Table 2.2, and the data
transformation operators that transform the data from one stage to another, as listed in
Table 2.3.
Stage Description
Value The raw data.Analytical Abstraction Data about data, or information, a.k.a.
meta-dataVisualization Abstraction Information that is visualizable on the
screen using a visualization techniqueView The end-product of the visualization map-
ping, where the user sees and interpretsthe picture presented to her
Table 2.2: Data Stages in the Data State Model
Tory and Möller [115] argue that these taxonomies are vague because of the termi-
nology used. As an example they cite the use of the word “often” in Card and Mackin-
lay’s definition [115, pp.1]. To reduce this ambiguity, their taxonomy is based on the
data model rather than the type of data itself. A data model is a representation of data
2.3 Visualization 23
Processing Step Description
Data Transformation Generates some form of analytical ab-straction from the value (usually by ex-traction).
Visualization Transformation Takes an analytical abstraction and fur-ther reduces it into some form of visual-ization abstraction, which is visualizablecontent.
Visual Mapping Transforma-tion
Takes information that is in a visualizableformat and presents a graphical view.
Table 2.3: Transformation Operators in the Data State Model
that may include structure, attributes, relationships, and the data values themselves.
Visualization algorithms create visual representation of data using a data model. The
taxonomy is outlined in Figure 2.11. The visualization algorithms are first classified
as continuous or discrete. Scientific visualization corresponds largely to continuous
models while information visualization corresponds largely to discrete models.
Figure 2.11: The Model-based Visualization Taxonomy
Unlike Card and Mackinlay’s taxonomy, the model-based taxonomy maintains
scalar, vector, and tensor categories for dependent variables. Additionally the taxon-
omy shows greater flexibility than Shneiderman’s taxonomy, categorizing temporally
ordered 3D data as nD data in the continuous model. A limitation of the taxonomy is
that it does not treat temporal data as distinct from 1D data.
24 Chapter 2. Background
2.4 Information Uncertainty Visualization ApproachesJohnson and Sanderson argue that “development of formal theoretical frameworks and
the new visual representations of error and uncertainty will be fundamental to a better
understanding of 3D experimental and simulation data” [51, pp. 5]. This is a relatively
new field in visualization, which is generally referred to as uncertainty visualization.
However, uncertainty and its sources are diverse and the term can have broad meaning.
On the other hand, error visualization (e.g. [51, 83]) and fuzzy visualization (e.g. [94,
39, 5]) imply particular uncertainty modeling techniques. In this thesis we use the
term information uncertainty visualization to refer to visualization of all modeling
techniques where the uncertainty can be codified in information. Thus error- and fuzzy-
visualization are sub-categories of information uncertainty visualization, which is itself
a sub-category of uncertainty visualization. This relationship is shown in Figure 2.12.
u n c e r t a i n t y v i s u a l i z a t i o n
i n f o r m a t i o nu n c e r t a i n t yv i s u a l i z a t i o n
e r r o rv i s u a l i z a t i o n
f u z z yv i s u a l i z a t i o n
Figure 2.12: Relationship between Uncertainty Visualization, Information UncertaintyVisualization, Error Visualization, and Fuzzy Visualization
Visualisation techniques map data variables and information to visual feature di-
mensions for the purpose of highlighting trends, making comparisons, establishing
outliers, examining data composition, and similar reasons. The introduction of uncer-
tainty requires that appropriate visual features be selected to represent it. Blurring sim-
ulates the visual percept caused by an incorrectly focused visual system and therefore
has the most immediately intuitive mapping for uncertainty [92]. Blurring effectively
smears the boundary of the graphic representing the data value, creating a sense of
uncertainty as to where it begins and ends. A number of visual features may be used
in a similar manner, including hue, luminance, saturation, and can be extended into the
temporal domain through animation [67, 34, 10].
Brown [10, pp. 84] offers a summary of available features drawn from the literature
2.4 Information Uncertainty Visualization Approaches 25
(e.g. [44, 33, 91]):
• Intrinsic representations - position, size, brightness, texture, color, orientation,
and shape;
• Further related representations - boundary (thickness, texture and color), blur,
transparency and extra dimensionality;
• Extrinsic representations - dials, thermometers, arrows, bars, different, shapes,
and complex objects - pie charts, graphs, bars, or complex error bars
2.4.1 Low-level FeaturesWe now consider how several low-level features can be used to indicate uncertainty
within information. Low-level features refer to techniques applied to individual ob-
jects within a visualization. The next section describes high-level features, which refer
to techniques that involve the arrangement of two or more objects. These high-level
constructions build on the use of low-level features. The features to be considered are:
hue and luminance, opacity, blurriness, depth, texture, particles, glyphs, and sonifica-
tion.
Hue and Luminance are commonly used to highlight data that is different, or to rep-
resent gradients in the data [117, 56]. Saturation of the hue can be used to high-
light the precision or certainty of the data. The more saturated the hue, the
more certain or crisp the value contained in that region is, while low saturation
regions have the appearance of washing into each other, and can be used to in-
dicate the fuzziness of spatial region boundaries [50, 42]. Variation in hue can
also be used to indicate precision. Regions of higher uncertianty can have fewer
shades, while more precise areas have a smoother appearance. A lack of back-
ground/foreground separation (e.g. eg. red on purple) can also imply uncertainty,
as the region may only just be distinguishable [124]. Brown and Pham [94] used
the color hues to represent the membership values of data points. Color hues
were also used by Lowe et al. [70] to represent belief values in the form of a
flame to facilitate decision making in an anaesthetic monitoring system.
Opacity offers an intuitive method for implying uncertainty. The more uncertain re-
gions can be shown with reduced opacity, creating a ghost-like effect. The in-
verse approach, used by Djurcilov et al. [24, 25], is to map regions of high un-
certainty to high opacity, thus drawing attention to the uncertain areas in volume
visualization (see Figure 2.13). Johnson and Sanderson [51] show an example
of a Magnetic Resonance Imaging (MRI) scan with an added error volume. The
26 Chapter 2. Background
error volume represents the space of possible variation and is transparent so that
the other data is still visible.
Figure 2.13: Using opacity to show the structure of uncertainty. Color scheme (left),Normal rendering (centre), Uncertainty structure (right)
Blurriness applies the same concepts as opacity to spatial mappings. In this way the
uncertainty is indicated by the imprecise position and extents of objects.
Depth can be used to indicate an order or spatial positioning for the data. Pang et
al. [86] and Brown [10] displayed intentionally different images to each eye,
exploiting a lack of binocular fusion to indicate fuzziness. Blurring or depth
of field effects from spatial frequency components being removed in the image
plane can be used to show the indistinct nature of data points [34, 64].
Texture may be applied to objects to indicate the level of precision, ambiguity or
fuzziness in the spatial location upon an object or upon a spatial location. Pang
and Alper [84] used random normal perturbation to create a textured surface.
The effect was proportional to the amount of uncertainty, creating rough regions
where the uncertainty is high. Certain shimmering effects, usually to be avoided
in visualization [117], can be used to indicate ambiguity within the region [113].
Particles can be used to represent the uncertainty of a region or object by varying
their density, opacity, and color. Grigoryan and Rheingans [37, 38] use particle
density to indicate uncertainty. These particle clouds create a similar effect to
transparent volumes. Cartography often also uses a form of this by drawing
dashed lines to represent imprecise lines and boundaries, or by using different
dot densities to represent shading effects [36].
Glyphs are the most widespread methods for displaying uncertainty. The size of a
glyph is often used to indicate a scalar measure of uncertainty. For example, error
2.4 Information Uncertainty Visualization Approaches 27
bars are a traditional technique for indicating errors in measurement [117]. The
larger the error bar, the more uncertainty there is. This concept was expanded
upon by Pang and Freeman [85], who used the size of spherical and ellipsoidal
glyphs to indicate uncertainty in radiosity applications. Lodha et al. [67] inves-
tigated uncertainty glyphs for flow visualizations, also using length to indicate
degree of disagreement. In separate work [68] they used glyphs to show variation
between surface interpolants, finding them to be more precise than using other
features. Wittenbrink et al. [125] mapped variation in vectors to glyph length
and width, to show uncertainty in magnitude and direction. In the same work
they explored glyphs in keyframed animation to expose differences between in-
terpolation techniques.
Sonification is an approach that was explored early on. There are two main meth-
ods, one is to map the uncertainty directly to the pitch or volume, while the
second uses the degree of uncertainty to regulate a noise generator. Fisher [31]
allowed the user to scan a cursor over a landscape while the program emitted
sound depending on the degree of uncertainty. Lodha et al. [66] went further by
allowing multiple sound variables to be mapped simultaneously, thus increased
the amount of information conveyed.
These low-level features offer an added dimension to which we can map uncertainty
information for a particular plot point. Zhou and Pang [126] looked at several examples
to visualize the level of error between original and reduced resolution meshes in a
multi-resolution mesh algorithm (see Figure 2.14). We now consider how these are
used in higher level constructions and methods that require multiple data points. In our
discussion we include different spatial arrangements, use of image based techniques,
addition and modification of geometry, and the use of animation.
Figure 2.14: Some visual mappings for showing difference. From left to right: over-lay, rainbow mapping, white-black-white pseudo-coloring, glyph (hi-pass filter), glyph(low-pass filter)
28 Chapter 2. Background
2.4.2 Higher-level ConstructionsUncertainty can be represented in several ways using 2D cartesian graphs. Some ex-
amples of graphs include histograms, bar charts, tree diagrams, time histories of 1D
slices, maps, iconic and glyph-based diagrams. For example, graphs are often used to
represent the fuzzy membership functions (e.g. Figures 2.1-2.4) or proability density
functions. The structure and inter-relationships of rules can be illustrated using graphs,
trees and flowcharts.
Fuzzy rules involving two inputs can be graphed in three dimensions. Figure 2.15
shows an example from the Matlab Fuzzy Toolbox [75], where the output shows the
amount of tip, as determined by the quality of the food and service. Nürnberger ex-
plored drawing such classifiers as overlapping pyramid shapes. 2D classifiers are visu-
alized as contours for a top-down view [80], whereas 3D classifiers are 3D shapes. An
extension to this work discusses the effects that antecedent pruning has on the shapes
[81]. Pruning of antecedents involves removal of restrictive rules and simplification
of existing rules with the aim of improving the ability of the classification system to
generalize to previously unseen input data. The authors argue that rule simplifications
can have a dramatic impact on results and that visualization of these changes can pro-
vide an intuitive aid for fuzzy classifier designers. While the technique produces an
intuitive aid, the authors have not gone far enough. Since the classifier is visualized as
a shape that occupies the same space as the data, it suggests that it can be visualized
together with the data. This would allow the user to observe how the data points of a
particular data set classify, particularly when combined with animation or interactive
techniques. Possible extensions include using size, color, and translucency to enhance
perception of the classification given to a data point. Cox et al. [20] applied thresholds
to produce convex hull plots of data point clusters, using glyphs of different shapes and
sizes for the data points.
Figure 2.15: How much tip should be given based on the quality of the food and serviceusing fuzzy inference
A limitation of these techniques is that they are not well suited to multi-dimensional
2.4 Information Uncertainty Visualization Approaches 29
data. Techniques such as multi-dimensional scaling [6] and parallel coordinates [39]
provide ways to display multi-dimensional fuzzy data in 2D without loss of informa-
tion. However, the degree of membership is not indicated in a standard parallel coordi-
nate plot. Berthold and Hall [5] use blurring to expose the level of fuzzy membership
on parallel coordinates. An alternative proposal by Pham and Brown [91] extends
coordinates to the third dimension, where the new dimension represents the member-
ship value. One technique for multi-dimensional scaling involves an algorithm that
minimizes the inter-point distances. The rule set is then visualized as a 2D scatter
plot, where grey scales denote different classes and the size of each square indicates
the number of examples [94]. Another technique for viewing high-dimensional fuzzy
rules in 2D places rules as shapes on a grid. The distance between rules in high-
dimensional space is mapped to their distance in 2D. The technique uses a gradient
descent algorithm to minimize the error between the 2D and actual distances [6].
When visualizing clusters it is often a requirement to find outliers in the data. One
method to improve the identification of outliers in fuzzy classification problems is to
modify the “objective function”. Keller proposed additional weighting parameters for
“representativeness” [55, pp. 143]. The application of this technique produces the
same principal clusters, but outliers are more easily detected since they are excluded
to a greater degree from the fuzzy clusters.
Fujiwara et al. [32] and Gershon [34] produced a 3D flowchart to represent rule
structure to facilitate understanding of rule-based programs. This is an extension of
the cone tree visualization technique [102]. Dickerson et al. [23] used a graph to
encode relationships in a complex interacting system. This technique is useful for
encoding expert information which is commonly present in fuzzy control systems.
Brown and Pham [11] extended these techniques further to by mapping additional
features to uncertainty (such as opacity) for each node.
Image based techniques can also be used to convey uncertainty. These methods are
the uncertainty analogs of image based visialization techniques such as Line Integral
Convolution (LIC) [40]. In these methods a pattern is generated that abstractly reflects
the uncertainty. One difference between image based techniques and glyphs is that
image based techniques apply a regular pattern over a continuous area. This avoids
clutter sometimes experienced by glyph techniques where the glyphs obstruct one an-
other. Sanderson et al. [104] used reaction-diffusion models in flow visualizations and
conveyed uncertainty through spot size and orientation.
Pang and Freeman [85] (see also [86]) observed that geometry can be added or
modified to indicate uncertainty. An example of modification is to create a texturing-
like effect by peturbing the orientation of faces within a geometric mesh model. The
30 Chapter 2. Background
amount of peturbation is governed by the degree of uncertainty. There are two com-
mon examples of adding geometry. The first is to add geometry for a single data point,
typically to give a direct indication of the extent over which the object can exist. An-
other is to connect successive data point extents, simulating the volume of possibility.
An example of the latter was demonstrated by Lopes and Brodlie [69], who used tubes
for particle flow visualization.
All of the low-level visual features that have been discussed can be animated. For
example, using motion blur, flickering, animated glyphs, etc. to represent the precision
of the measurements of a moving object [125, 10]. Brown [10] explored temporal vi-
brations for conveying uncertainty. The vibrations oscillate between values fast enough
to be pre-attentive [116], causing a shimmering effect that implies uncertainty about
its true position. Figure 2.16 shows frames from a movie of the luminance oscillation
technique, with a region of high uncertainty framed within the dashed rectangle. This
technique can also be applied in stereographic displays to facilitate a lack of binocular
fusion [10].
Figure 2.16: Two frames from an animation that uses a shimmering effect to indicateuncertainty by oscillating luminosity in regions of high uncertainty
Probability distributions are often graphed as 2D line graphs. Kao et al. [52, 53,
54] explored showing multiple data points, each of which is subject to a probability
distribution. In one example they overlaid the probability density functions for points
of interest, as shown in Figure 2.17. Other approaches included using color, texture,
and heightmaps to indicate uncertainty. Luo et al. [72] plotted many small histograms
in a small multiples [117] technique.
The Geographic Information Systems (GIS) field has had a particular interest in
information uncertainty visualization. MacEachren et al. [74] and Slocum et al. [109]
methodically review the state of play with respect to uncertainty visualization in GIS.
2.5 Summary 31
Figure 2.17: A visualization that draws the probability density function over associateddata points
Outside of this field, the development of visualization techniques for information un-
certainty is typically ad hoc, being created for a specific modeling technique or appli-
cation. All of these represent important steps forward, however, an integrated frame-
work to manage the modeling and visualization of information uncertainty is currently
missing.
2.5 SummaryIn summary, the information uncertainty modeling and general visualization fields
have separately been well studied. Several visualization techniques have been created
for the various information uncertainty models. However, which one is best suited to
the task at hand, and how will uncertainty modeling and propagation be tracked and
interpreted properly? What happens when the information uncertainty changes? In-
formation uncertainty modeling represents our knowledge and expectation about the
behavior of a variable under uncertainty, and this knowledge may be subject to change
over time, particularly as new information comes to light. Currently, there is no inte-
grated framework for the modeling, propagation, and visual mapping of information
uncertainty. Furthermore, there is no framework that can adapt to changes in informa-
tion uncertainty.
32 Chapter 2. Background
CHAPTER 3
Framework for Integrated Uncertainty
Modeling and Visualization
3.1 A New Approach to Information UncertaintyTraditional visualization systems, which typically do not deal with information uncer-
tainty, can still be subject to dynamic data. For these systems the dynamism refers to
changes in value. The result is that the visualization needs to be recalculated, which is
a straight forward process. Changes in information uncertainty, on the other hand, pro-
vide a unique challenge: the actual modeling technique used can change in response
to changing information. Therefore, the data type of the variable can be dynamic. The
data type refers to the form in which the information is stored and managed. This is
clearly illustrated by the case of prediction: before the event comes to pass, the un-
certainty might be modeled using a number of different techniques; once the event
has come to pass, the prediction can be updated with the actual outcome. 1Thus the
visualization must not only be recalculated, but must also adapt to this new data type.
Adapting to new data types for a visualization is not a straight forward process.
Visualization techniques are designed for a particular data type and may not support the
new data type without modification. For example, line graphs rely on a series of values
between which line segments are connected. Should the source of information be
defined by a series of intervals, then the traditional line graph is no longer appropriate.
One suitable modification turns the line segments into convex polygons, whose edges
are defined by the upper and lower bounds of the interval.
1Assuming the outcome is known, the data type then becomes one of absolute certainty.
34 Chapter 3. Framework for Integrated Uncertainty Modeling andVisualization
It is recognized that it is important that visualization systems convey uncertainty [83].
Many problems are subject to uncertainty and, as a consequence, visualization research
has produced several visualization technique modifications to support the uncertainty.
For example, transparent volumes have been added to volume renderings to indicate
potential error (e.g. [51, pp.9]), and parallel plots were extended into the third dimen-
sion to handle fuzzy variables [94]. This type of work continues, and the outcomes
continue to be data type specific.
The objective of this thesis is to integrate the process of modeling and visualizing
information uncertainty into an extensible and adaptive visualization framework. Such
a framework will provide greater uniformity for the field and enable both practitioners
and researchers to reduce the data type and visualization technique dependency. The
process that the user follows when armed with such a tool can therefore change.
The typical process that a user follows when dealing with uncertainty consists of
the following steps.
1. Decide on variables
2. Decide on uncertainty data type(s)
3. Build the data model, propagating uncertainty manually
4. Construct visualization(s), incorporating uncertainty where techniques are avail-
able and appropriate
In practice, steps 3 and 4 will be repeated as information changes. However, step 2 will
rarely be revisited. The significant point of this process is that the uncertainty model
is decided upon before the user’s data model is built. This can be unintuitive, as the
amount of uncertainty can change depending on how the model pans out. If it were
easy to add, change, or remove uncertainty details at any point in the process, then the
typical process changes, as follows.
1. Decide on variables
2. Build an initial data model
3. Construct visualization(s)
4. Add/remove/change uncertainty information
Where step 4 can occur anywhere after step 1, and can be repeated as often as is
necessary. Under such a process the uncertainty information is viewed as a refinement
on details that does not fundamentally change the data model.
3.2 Analysis of Issues and Requirements 35
This chapter describes an integrated framework for the modeling and visualization
of information uncertainty. This framework is adaptive to changes in uncertainty in-
formation, allowing the user to select the appropriate techniques for the task at hand.
In Section 3.2 we consider the issues that must be overcome, from which we derive re-
quirements of this framework. Section 3.3 describes the components of the framework
to meet the requirements. Section 3.4 provides a summary of key points.
3.2 Analysis of Issues and RequirementsThis section examines the issues that confront users when they seek to visualize infor-
mation uncertainty. From a theoretical perspective there are three main issues. Firstly,
visualization techniques are based around specific uncertainty data types. Thus, the
selection and application of visualization techniques has a tendency to be ad hoc.
Secondly, there is incoherence between information uncertainty modeling techniques.
This locks users into a particular modeling technique, the appropriateness of which
may change as the information evolves. Thirdly, information uncertainty modeling and
visualization is hampered by an artificial separation between the value of a variable and
the uncertainty model of that value. This poses problems that affect the robustness of
user models and the effort required to maintain them.
From a practical point of view, the user is required to have both a comprehen-
sive understanding of uncertainty as well as sophistication with visualization tools.
Comprehensive understanding is required, because the user must manually encode and
propagate uncertainty information; sophistication with visualization tools is required
to allow for the unusual demands of mapping uncertainty to visual elements. Many
tools lack support for information uncertainty modeling and visualization, leading de-
termined users to cobble together multiple tools.
3.2.1 Ad hoc Visualization TechniquesSensemaking Cycle and Changes in Uncertainty Information
Visualization is “the bringing out of meaning in information” [56]. It is performed it-
eratively and usually as part of the sensemaking cycle [103, 17]. The iterative looping
is not exclusive to mapping data into visual form; instead, users sometimes return to
the data model to gather or transform data. This is particularly true for information un-
certainty. For example: uncertainty details can be deemed to be more important later,
once the basic model is in place; or the uncertainty details may change as more be-
comes known about the variables. Therefore, frameworks for information uncertainty
visualization should ideally allow the user to go back to make changes with minimal
effort.
36 Chapter 3. Framework for Integrated Uncertainty Modeling andVisualization
Flexibility
Visualization of information uncertainty is different to visualizing other forms of in-
formation for two main reasons. Firstly, information uncertainty is always associated
with a particular unit of information. This means that the uncertainty cannot be freely
visualized without regard to its interpretation relative to the information to which it
belongs. Secondly, information uncertainty is usually mapped differently to visual el-
ements. For example, uncertainty is commonly mapped to intrinsic properties, such as
transparency or color; or by adding a dimension to geometry, such as using a surface
where there would otherwise be a line. Therefore, a visualization system for informa-
tion uncertainty requires the flexibility to allow users to map uncertainty to compound
visual elements, including intrinsic properties and adding dimensions to geometry.
Figure 3.1 demonstrates how information uncertainty is associated with informa-
tion, but typically mapped differently to visual elements. Four graph visualizations of
historical and predicted employment rates in California are shown. The first graph (a)
assumes that growth will continue at the average growth rate of the past 15 years and
is therefore visualized using traditional means. While the information in graph (a) is
modeled as not being subject to uncertainty, it requires the unreliable assumption about
employment rates to be made. The graph in (b) estimates that the growth will continue
at the average rate. The fact that the predictions are estimates is indicated by the line
stippling, an intrinsic property of the line. The graph in (c) shows the possible range
within the maximum and minimum growth rates experienced in the past 15 years. The
uncertainty is indicated by extending the one dimensional line into a two dimensional
polygon. The graph in (d) uses a normal distribution centered on the average growth
rate. The uncertainty is indicated by both extending the dimensionality of the line as
well mapping to the intrinsic property of opacity.
Heterogeneity in Uncertainty Information
Several uncertainty visualization techniques have been developed for particular uncer-
tainty types. However, in an environment where the uncertainty type can change to bet-
ter suit the needs of the user, such restrictive preconditions for visualization techniques
provide a return to the tyranny of uncertainty type lock-in. Therefore the approach to
visualization of information uncertainty requires visualization to provide greater con-
sistency across different uncertainty modeling techniques.
Homogeneous Access
To enable the visual mappings that expose the uncertainty in variables, it is necessary
to have access to the associated uncertainty details. However, there are numerous un-
certainty modeling techniques that use different methods for encoding the uncertainty.
3.2 Analysis of Issues and Requirements 37
(a) (b)
(c) (d)
Figure 3.1: Visualizations of Employment Numbers in California. Years 2005-2010are predicted. (a) Assuming Average Growth (b) Indicating Growth is Estimated (c)Possible Growth (d) Likely Growth. (Data Source: California Employment Develop-ment Department)
38 Chapter 3. Framework for Integrated Uncertainty Modeling andVisualization
This creates a barrier to visualizing uncertain information because visual mappings
that work with one uncertainty modeling technique may not be easily transferable to
another. Such inconsistency creates a strong dependency between visualizations and
the data types used in the model, limiting the user’s ability to update the data model.
Therefore, a generalized means for accessing uncertainty information should be sought
to enable a consistent environment information uncertainty visualization.
Plurality of Values
Fundamental to the concept of information uncertainty is the ability for a variable to
hold multiple values simultaneously; in other words, the variable has multiple possible
collapses. This plurality of values represents the deferral of the approximation decision
- the true value of a variable may be one of multiple candidates, each of which should
be considered a possibility.
3.2.2 Incoherence of Uncertainty ModelsUncertainty Data Type Lock-in
There is usually no support for changing from one uncertainty modeling technique to
another. Adding uncertainty information to data allows the user to specify a greater
level of detail about the data. However, changing the uncertainty data type typically
requires users to reconstruct the affected portion of the data model, often involving a
fundamental change in form. This makes the data model rigid and, as a consequence,
users will typically need to anticipate their use of uncertainty and build their model
accordingly.
Manual Propagation of Uncertainty
Since it is usually up to the user to manage and interpret uncertainty parameters, the
use of information uncertainty requires the user to have a mathematical understanding
of modeling techniques. This would explain why, although uncertainty modeling is
common in disciplines such as engineering and physics, it is often under-used in other
domains. While only some understanding is required when declaring the uncertainty,
the subsequent propagation of uncertainty, due to interaction between uncertain vari-
ables, requires more detailed understanding of the mathematical principles involved.
To ease this burden, the system should facilitate the automatic propagation of uncer-
tainty.
There are different mathematical models that are available for the propagation of
uncertainty under the various uncertainty data types. For some applications it is im-
portant that a particular mathematical model be used and therefore the user must be
capable of specifying the model to be used. In most environments it is up to users to
3.2 Analysis of Issues and Requirements 39
implement the correct model. However, any automatic propagation system must also
facilitate this choice.
Propagation in Heterogeneous Operations
The automatic propagation system should also cope with the situation where an op-
eration combines two variables of different uncertainty data types. One solution is to
comply with the principle of requisite generality [60], where variables are converted
into an uncertainty type general enough to express the resulting uncertainty. However,
there may be multiple methods for conversion. The user may wish to provide a specific
mathematical model that is better suited to their domain, or they may even wish to dis-
card some uncertainty information as a simplification. Therefore, the propagation of
uncertainty in heterogeneous operations should also facilitate choice of mathematical
model.
3.2.3 Artificial Separation of Information and UncertaintySeparability of Parameters
The closely related parameters of the uncertainty data type are often treated as separate
variables. For example, rather than declaring a variable as being modeled using a prob-
ability, many environments require separate variables for mean and variance. Thus, if
α were a variable subject to a normal probability distribution, it is stored as two sepa-
rate variables: αμ and ασ . This lack of structure is akin to use of the “go-to” directive
before the advent of structured programming, because the burden is upon the user to
treat these variables as being connected. This separation has two significant ramifi-
cations: firstly, it is easier to introduce errors since the environment does not enforce
any semantic properties of the uncertainty parameters; and secondly, the introduced
complexity discourages users from using uncertainty modeling techniques. Therefore,
the uncertainty parameters should be treated as part of a unit and the system should
enforce the semantics of the parameters.
Model Rigidity
For users to visualize information uncertainty, that uncertainty must be declared some-
where. The declaration of the information uncertainty should be co-located with the
information to which it relates, since the two are fundamentally connected. However,
this relationship is neglected in most environments, which instead require the user to
declare the parameters of the uncertainty separately from the variable. For example,
the variable α might have the value 5, but another variable, αcon f idence is required to
hold information about the certainty of α . This results in an added layer of complex-
ity and the user is faced with an increasingly intricate data model. Furthermore, there
40 Chapter 3. Framework for Integrated Uncertainty Modeling andVisualization
is often no support for changing from one uncertainty modeling technique to another.
Changing the uncertainty modeling technique typically requires the user to reconstruct
the affected portion of the data model. This makes the data model rigid and, as a con-
sequence, the user will typically need to anticipate their use of uncertainty and build
their model accordingly. The user is faced with a progressively more intricate data
model that becomes increasingly error prone.
3.3 Components of the FrameworkSevereal components are required to build an integrated uncertainty modeling and vi-
sualization framework. Figure 3.2 illustrates the components of the framework and
the chapter that describes each component in detail. The three main components are
the spreadsheet paradigm for information uncertainty, uncertainty encapsulation and
automated propagation, and uncertainty abstraction for visualization.
Chapter 3 - Framework
Chapter 4Spreadsheet Paradigm for Information Uncertainty
Chapter 5 Uncertainty Encapsulationand Automated Propagation
Chapter 6Uncertainty Abstraction
for Visualization
Uncertainty Encapsulation
Uncertainty Propagation Models
Uncertainty Abstraction Models
User-objectives based Visualization
Figure 3.2: Description of the framework
Features of the framework are:
• the ability to change uncertainty details in the model whenever necessary.
• does not require the user to be intimately aware of the uncertainty modeling
techniques - system manages propagation and avoids basic mistakes
• built-in support for visualizations that incorporate information uncertainty into
their display
3.3 Components of the Framework 41
• built-in support for modeling of information uncertainty
• extensibility: practitioners can add new data types and visualization techniques
in future
We now discuss overviews of each of the three components.
3.3.1 Spreadsheet ParadigmVisualization systems are commonly based on a data-flow network paradigm. How-
ever, the spreadsheet paradigm has been shown to offer advantages for visualiza-
tion [45, 46]. Features of the spreadsheet are that it visually lays out documentation,
intermediate data, and model logic for inspection. The data contained in a spreadsheet
is always up to date, which offers an intuitive view of the supporting data used in a vi-
sualization. Furthermore, spreadsheets are widespread for modeling tasks and widely
understood. The framework described here builds on the spreadsheet paradigm to take
advantage of these features.
A fundamental extension that we make to the spreadsheet paradigm is to support
uncertainty information at the sub-cell level. Figure 3.3 shows the new conceptual
structure. Workbooks are the top level objects and typically there will only be one
workbook open at a time. The workbook contains several spreadsheets, which are
a matrix of cells. Each cell contains a unit of data, typically either a text label, a
formula, or a numeric value. The cell types are extended such that each cell may
include uncertainty details.
Workbooks
Spreadsheets
Cells
Uncertainty Parameters
have
have
have
Figure 3.3: Additional Layer in Spreadsheet Hierarchy
The use of formulae creates functional relationships between cells. Formulae tra-
ditionally operate at the cell level and thus require two extensions to operate in the new
environment. Firstly, the execution of formulae needs to take uncertainty details into
account. This is mostly transparent to the user. Secondly, we introduce new functions
that allow the user to programatically access uncertainty details of cells. These func-
tions enable users to pack and unpack uncertainty information contained in cells using
formulae.
42 Chapter 3. Framework for Integrated Uncertainty Modeling andVisualization
Figure 3.4 shows a prototype implementation that demonstrates the application of
this framework. Information uncertainty can be entered into cells and it is automati-
cally propagated through the formulae. The currently highlighted cell E4 contains a
formula that combines an interval with a known quantity, resulting in an interval. A
similar formula is repeated for all cells in the range B4:F7.
3.3.2 Uncertainty EncapsulationThe freedom to choose the appropriate modeling technique for the uncertainty informa-
tion at hand requires an ability to change techniques with minimal disruption. We treat
the uncertainty information as intrinsically connected to the unit of information, an
approach we call uncertainty encapsulation. This allows the user to specify variables
and their uncertainty such that the system is intrinsically aware of the relationship.
Two components are required to facilitate the adaptive modeling characteristics.
The first is an ability to enter variables as uncertain quantities, which uses encapsula-
tion. The system enables the user to specify the modeling technique and uncertainty
parameters within a single cell. Since this information is treated as a unit, the sys-
tem can protect and manage that information. The second component is a suite of
propagation methods that dictate the way uncertainty information is carried forward
through operations. The method is indexed by the desired operation and the types of
parameters. The formula system invokes the appropriate handler using a look-up table.
The screen shot in Figure 3.4 shows several different uncertainty modeling tech-
niques being used to model the variable change. At any time the user can navigate
to any of the input variables and modify them to contain any supported type of un-
certainty. The uncertainty is then automatically propagated to the other cells by the
formulae. Thus, the system adapts to the modeling technique automatically.
3.3.3 Uncertainty AbstractionUncertainty data type abstracted visualization ensures that the visualization can cope
with changes in the uncertainty of information. Visualization techniques are built for
generic abstract types and will therefore no longer be bound by the underlying data
type.
To achive this we introduce two main uncertainty abstraction models, which spec-
ify the interface to uncertainty information. The first is the Unified Uncertainty Model,
which does not distinguish between different types of uncertainty. This model is sim-
plistic in nature but provides sufficient detail to produce all of the visualizations listed
in the background Section 2.4. The second is the Dual Uncertainty Model, which dis-
tinguishes between possibilitic and probabilistic views. This model can be used when
3.4 Summary 43
such a distinction is required by the domain or task. Further abstraction models are
possible, which we briefly cover in Chapter 6.
Traditional approaches to visualization are primarily data type centric. This clearly
is not an appropriate approach for data type abstracted visualization. Task-driven ap-
proaches have recently appeared (e.g. [11, 57, 82]), where visualization critera are
derived from task requirements. The main drawback of these approaches is that they
tend to be application domain specific and therefore lack generality. We argue for a
User-objectives based approach, which is data type independent and not application
domain specific. We provide categories of user-objectives for information uncertainty
visualization. Each objective seeks to highlight a particular aspect of the information
uncertainty according to the insight that the user is trying to gain. We also present an
algorithm for eliciting objectives from the user.
3.4 SummaryThis chapter outlined a new approcah to information uncertainty modeling and visu-
alization. We provided an analysis of the issues that need to be addressed, which fit
into three main categories. Firstly, current visualization techniques tend to be ad hoc.
Secondly, different uncertainty models are not necessarily consistant with one another.
Thirdly, there exists an artificial separation between variables and their uncertainty de-
tails. This separation effectively promotes uncertainty details to the same level as the
data to which they belong, increasing the potential for user induced error.
We presented a framework that addresses these issues by taking an encapsulation
and abstraction approach to provide an integrated framework. The framework con-
sists of three main parts. The first is an extension to the spreadsheet paradim as a
visualization platform for information uncertainty. The second is uncertainty encapsu-
lation, which treats information and its uncertainty as a unit. The third is uncertainty
abstraction, which enables visualizations to be built independently of the uncertainty
modeling technique used in the data. Each of these components are detailed in the
following chapters.
44 Chapter 3. Framework for Integrated Uncertainty Modeling andVisualization
Figure 3.4: Screenshot of the Prototype System
CHAPTER 4
Spreadsheet Paradigm for
Information Uncertainty
This section describes an integrated visualization and modeling system design that uses
a spreadsheet paradigm. This system integrates the modeling and visualization tasks,
allowing a tight feedback loop between visual inspection and data model building.
4.1 Motivation and ObjectivesThe relationship between spreadsheets and other approaches to data models can be
illustrated using a formal definition. Spreadsheets consist of four components [48]:
the schema, a definition of the spreadsheet logic; the data, which are the instance
values for this spreadsheet; the editorial, which consists of headings, borders, etc.; and
binding, which is the mapping of the content to the tabular structure of cells. It is the
binding property that is responsible for the tabular layout of a spreadsheet.
The spreadsheet paradigm allows a great amount of freedom for users to organize
their information. The freedom to quickly perform experimental calculations that do
not affect the rest of the data model facilitates exploration tasks. However, the draw-
back of this freedom is that spreadsheets can be error prone. The fact that spreadsheets
are so widespread and yet capable of errors has motivated much research into spread-
sheet testing methods [9, 30], particularly where spreadsheets are used for financial
decisions.
The terminology used in this section is as follows. The workbook is made up of
sheets. A sheet is a heterogeneous sparse two-dimensional grid of cells. Sheets are
46 Chapter 4. Spreadsheet Paradigm for Information Uncertainty
also theoretically infinite, but practically constrained due to resource limitations. The
heterogeneity refers to the ability to have cells of different types within the same sheet.
They are sparse because cells can be empty. A cell is an addressable spatial location
that contains a unit of information. We use the term uncertainty spreadsheet to refer to
any spreadsheet that includes information uncertainty and reserve the term uncertainty
visualization spreadsheet for spreadsheets that include both information uncertainty
and visualizations.
Our approach integrates both the visualization and modeling tools into a single
system. Spreadsheets are ideal for this because they are interruptible, widely under-
stood, and in a constantly running state. The interruptible characteristic allows the
user to move to another location in the spreadsheet to experiment, without interfering
with their main task. They are widely understood by users because spreadsheet use is
ubiquitous, especially in the financial modeling field. Finally, unlike scripts that must
be run before they produce results, a spreadsheet is constantly in an up-to-date state,
allowing it to be easily interrogated and refined.
There are numerous information uncertainty modeling and display techniques, and
new ones continue to be developed. Therefore a plug-in based architecture is used to
allow new uncertainty data types and display techniques to be added to the system. Fur-
thermore, there are multiple mathematical models for the propagation of uncertainty.
Users require an ability to choose appropriate functions for their task and the system
will need to present the user with options that are semantically valid. Therefore, the
plug-in system also allows new mathematical models to be added.
To support the needs of visualizing information uncertainty, the system must be
capable of mapping uncertainty information to intrinsic properties and geometric ex-
tensions, in addition to stand-alone visual elements. To provide flexibility, the sys-
tem should allow the user to build the visualization using as many visual elements as
needed. The visual elements can then be provided through the plug-in system to allow
extensibility, particularly since new display techniques continue to be developed.
4.2 Related Work on SpreadsheetsThe spreadsheet paradigm is widely understood for managing numerical information,
prompting researchers to explore other uses. An early proposal to generalize spread-
sheets is the Analytical Spreadsheet Package [95] (ASP), which allowed any Smalltalk-
80 object to be placed inside a cell and used Smalltalk messages as formulae. While
this provides flexibility, it is too general and complicated for non-expert users to under-
stand. However, ASP did anticipate many of the ideas that were explored in subsequent
papers, such as widgets, which are available in Spreadsheets for Images [65] (SI). SI
extends spreadsheets to include graphical objects, including several different widgets
4.3 Architecture and Features 47
and images. Further, SI takes the unusual step of allowing formulae to write their re-
sults to a different cell. While this offers flow control, it can complicate the user’s
interpretation of the spreadsheet.
FINESSE [123] specifically targeted real-time financial information, adding im-
ages, heat maps, and graphs to the regular cell types. FINESSE introduced “pre-
sentation relationships”, where groups of cells have access to common presentation
attributes. This provides for shared memory that is not shown in a cell. However,
other systems (including SI) achieve a similar effect by storing presentation attributes
in cells.
The Spreadsheet for Information Visualization[45, 46] (SIV) explored more gen-
eral visualization, building on the Visualization Toolkit [105] (VTK). Each cell in SIV
can contain a visualization, including the data sets used to drive the visualization. Vi-
sualization related operators are available and can operate on multiple cells, such as
a whole column. SIV is motivated by the ability to compare visualizations side-by-
side, particularly to see incremental changes, which is referred to as “small multiples”
after [118, pp.67]. Further, a key advantage of spreadsheets is to use templates for
analysis and experimentation. While suited to visualization tasks, SIV is not partic-
ularly suited to modeling as it is optimized for fewer cells containing larger data-sets
and has dispensed with traditional text and numerical cells.
VisTrails [4, 13] specifically uses a spreadsheet for displaying multiple visualiza-
tions for side-by-side exploration. In this sense the term spreadsheet refers to the
tabular appearance rather than any ability to create formula driven relationships. Sim-
ilarly, tabular visualization methods, such as Hyperslice [122] and TableLens [98],
share some similarity to spreadsheets. However, traditional spreadsheets are sparse,
allow a mix of cells, and offer inter-cell dependencies through formulae. These prop-
erties lend to a paper-likeness that separates spreadsheets from regular tabular displays.
4.3 Architecture and FeaturesThe goal of this approach is to directly support uncertainty information within spread-
sheet cells, in a managed and extensible way. The process for construction of a spread-
sheet using this approach is described later in Section 4.4.
There are two fundamental extensions to traditional spreadsheets that provide for
uncertainty visualization spreadsheets: uncertainty encapsulation, where the uncer-
tainty details are kept together with the information as a unit; and uncertainty ab-
straction, which enables homogeneous visual mapping of uncertainty details despite
differences in modeling techniques.
48 Chapter 4. Spreadsheet Paradigm for Information Uncertainty
4.3.1 Uncertainty EncapsulationWe extend the traditional spreadsheet to include novel cell types that facilitate infor-
mation uncertainty modeling techniques. There are a number of different modeling
techniques, but they are all fundamentally similar in that they describe the uncertainty
landscape surrounding a value. Thus, they should be compatible with one another and
should be able to interact using formulae.
The basic data types commonly found in a spreadsheet are shown in Figure 4.1.
ICell is the interface that all cell types must implement. Empty is a cell type that
exists to hold formatting information, such as borders and changed background color.
Label cells hold arbitrary text strings, usually used for documenting the spreadsheet.
Quantity contains a constant number, as entered by the user. Formula cells contain a
formula that is evaluated to produce a result. The result of a formula is either a Label,
Quantity, or Error cell type, which is displayed in the spreadsheet.
Quantity
ICell
Label FormulaEmpty Error
Figure 4.1: Basic Cell Type Object Hierarchy
We introduce novel cell types that add uncertainty details to the cell. The novel cell
types model a quantity with added knowledge about its uncertainty They are therefore
derivatives of the Quantity cell type. We consider the traditional Quantity type to
describe a variable with uncertainty ignorance, since there is no associated uncertainty
information. This means that the variable may or may not be subject to uncertainty,
but the system has no evidence either way.
Examples of common novel cell types include: KnownQty, which represents a
quantity that is known to be exact; Estimate, where the quantity is known to be an
estimate; Interval, representing a continuous interval; and Gaussian, representing a
normal distribution. The hierarchy for these novel cells is shown in Figure 4.2.
Quantity
Estimate Interval GaussianKnownQty
Figure 4.2: Novel CellType Object Hierarchy
Each cell type has a distinguishing string format by which the user can input it into
the system. Our prototype uses the formats listed in Table 4.1 to determine the type of
cells. If none match then the input is assumed to be a Label type, which is capable of
holding any string value. The string is then parsed by the appropriate handler, which
4.3 Architecture and Features 49
generates a cell object. For example, the string “10+-2” will be processed by the
interval component of the uncertainty plug-in to produce an interval of 10± 2. The
resulting cell object is inserted into the current sheet at the current cursor location.
Type Format Example
Formula = Expression =A1+6Label ’String ’12.06Quantity Number 12.06
KnownQty Number# 12.06#Estimate ~Number ~12.06Interval Number +- Number 12 +- 5.0e-10
[ Number , Number ] [1, 2.1][ Number .. Number ] [-15..15]
Gaussian Number @ Number 10.1 @ 2.178
Table 4.1: Format of Cells in the Prototype System
Formulae are identified by a leading equals symbol. The remainder of the string is
parsed and converted into a sequence of function calls, with infix operators (e.g. “+”)
being converted into functions (e.g. “add”). Function tables are used to invoke the
appropriate function for the parameter types, which mirrors operator overloading in
traditional programming languages. The use of formulae creates functional relations
between cells and from these relations a dependency tree is built. The dependency
tree lists the cells that directly depend upon a particular cell. It is a stipulation of
spreadsheets that there cannot be any circular references as this would create an infinite
loop. When a user completes updating an existing cell, the system recalculates any
affected cells. Affected cells are determined by walking the dependency tree, starting
with the current node. If the current node is not a member of the dependency tree then
no other cells need updating.
The display of uncertainty details within each cell can lead to visual clutter. To re-
duce clutter, the system has an option to hide the uncertainty details and instead display
a representative value in the cell. Representative values are estimates of what an uncer-
tain variable might actually be. This works well because the user will typically think of
the cell contents as the representative value. Figure 4.3 shows a screen-shot of the pro-
totype system where uncertainty hiding has been enabled. The currently highlighted
cell shows the value 171.51, whereas the cell contents is actually “normpdf(175.51,5)”.
This behavior parallels formulae, which display the result of the calculation in the cell
rather than the formula itself. The prototype automatically shades cells to help identify
those that are uncertain. The default colors were chosen to contrast with one another
while remaining subdued. They are user configurable and automatic coloring can also
be disabled.
50 Chapter 4. Spreadsheet Paradigm for Information Uncertainty
For many common uncertainty types (e.g. estimates, intervals, gaussian proba-
bilities) the representative value is straight forward. However, in the case of unusual
uncertainty models this may not be the case. For example, consider a variable that can
only be either 3 or 4, but no other value. The representative value should therefore not
be 3.5, as this would be misleading. We define the principle of representation, which
states that the representative value must be a valid possibility for the variable. The
choice of which value to show depends on the implementation and might be guided by
application domain needs.
4.3.2 Uncertainty AbstractionThe visualization system is implemented as a specialized sheet, called a visualization
sheet. The layout of the visualization sheet matches the scene graph structure [28],
with every non-empty row of the visualization sheet representing a node in the graph.
The starting column indicates the position of the node in the hierarchy, where a row
that begins in column n will be a child node of its nearest preceding row that begins in
columns 1 ..n−1, or the root node if none can be found. Thus, all rows beginning with
the column “A” will be children of the root node, while rows beginning in column “B”
will be children of a preceding row beginning in column “A,” etc. The first non-empty
column contains a string key that determines the node type. The subsequent columns
contain the parameters for that node. The nodes objects manage their own scene graph
nodes. It is the responsibility of the node object to perform type checking on their
parameters.
Figure 4.4 shows a visualization sheet with three children of the root node: a Title
node, which displays title text; a 2Daxes node, which generates a rectangular grid; and
a Scale node, which adds a scaling to the transformation matrix of its children. The
2Daxes node has two children, which specify the labels for each axis. The Scale node
has a Translate child, and together they position the data correctly over the 2Daxes
object. The Color node specifies that its child should be drawn in blue. The AreaLine
node takes a sequence of lower and upper y-values and produces the polygon repre-
senting the data. All parameter fields can either contain a constant value or a formula.
In Figure 4.4 the AreaLine node is a child of the Color node, which is a child of the
Translate node, which is itself a child of the Scale node. This construction was used so
that the AreaLine would be scaled, positioned, and colored. The first parameter to the
Xaxis and Yaxis nodes determines the normalized position of the origin. Subsequent
parameters contain the axis labels.
Typical spreadsheet languages contains basic operators (such as addition, subtrac-
tion, multiplication, and division) and usually a wealth of commonly used functions
(such as statistical and financial functions). To support information uncertainty, the
4.3 Architecture and Features 51
Figure 4.3: Screen-shot of the Prototype
Figure 4.4: Visualization Sheet for the Graph in Figure 4.7
52 Chapter 4. Spreadsheet Paradigm for Information Uncertainty
language needs to be extended to allow access to the underlying uncertainty informa-
tion. Two categories of new functions are added. The first category contains the con-
version operators, which convert from one type of uncertainty model to another. The
second category involves interrogation of the uncertainty details, which is intended to
be used when mapping to visual elements. The prototype system provides four such
interrogation functions, listed in Table 4.2.
Function Returns
isCertain(x) true if the cell x doesn’t have associated uncertaintyLower(x) The lower bounds of the cell x.Upper(x) The upper bounds of the cell x.Certainty(x, y) The generalized certainty of the cell x being y, where y
is either a constant or another cell.
Table 4.2: Prototype Uncertainty Interrogation Functions
The mathematical model for managing the propagation of uncertainty through op-
erations is user selectable. Our approach is to use a function table, where the system
can look up the appropriate function to handle the requested operation. Plug-ins reg-
ister functions, which have a signature based on the operation and types of the param-
eters. Multiple functions can be registered for the same {operation, parameter types}
signature, but only one can be active at any given time. This includes both standard
functions and the uncertainty interrogation functions in Table 4.2.
4.4 New Process and WorkflowWe argue for an incremental process to building uncertainty spreadsheets. It is typ-
ically most convenient to begin construction of a model at the high level, where the
focus is on logic, before proceeding to add details. The use of uncertainty modeling
increases detail and is therefore typically most conveniently added later, once the basic
structure of the model is in place.
Figure 4.5 shows the process for building an uncertainty spreadsheet. In the first
step an initial spreadsheet is built, typically using uncertainty ignorance data types.
This step is similar to building a traditional spreadsheet, and includes the spreadsheet
structure, variables, and formulae. Next, the spreadsheet is iteratively refined in three
ways: firstly, uncertainty details are added, changed, or removed from variables; sec-
ondly, visualizations in the model are added, altered, or removed; and thirdly, the
model can be refined in the traditional sense, such as changing formulae or adding
variables.
The task labeled refine uncertainty details consists of two types of activities: adding,
changing, and removing uncertainty detail, and changing the mathematical model for
4.5 Capabilities and Advantages 53
Build Initial Spreadsheet
Refine Spreadsheet
Refine Uncertainty
DetailsRefine ModelRefine
Visualizations
Figure 4.5: Process for Constructing an Uncertainty Spreadsheet
propagation. There are three main steps to add, change, or remove uncertainty de-
tails: firstly, the appropriate variable is identified, e.g. a variable whose uncertainty
is currently ignored; secondly, its details are changed; and thirdly those changes are
evaluated, returning to the second step if found to be inadequate.
4.5 Capabilities and AdvantagesThe system described here extends the spreadsheet paradigm using a plug-in archi-
tecture to allow arbitrary cell types. The uncertainty plug-in implements information
uncertainty modeling techniques. This allows users to model a system using informa-
tion uncertainty as a native type. The parameters of the uncertainty are inherent in the
cell, providing structural and semantic support for the uncertainty modeling technique,
thereby avoiding potential errors that can arise when parameters are separated.
Conversion operators are supplied by the plug-ins that allow the user to convert
a cell of one type into another. The user is able to choose the mathematical model
that they wish to use from options supplied by plug-ins. The user is now free to itera-
tively build uncertainty into a model: first, a rough crisp data model is produced as a
proof-of-concept; then, the user refines the data model by promoting variables to add
uncertainty detail. The system handles propagation of the uncertainty automatically
and each variable with uncertainty is treated as a unit of information.
The use of a visualization sheet keeps the interface consistent and brings the power
of formulae to the visual mapping process. Through the combinations of several visual
elements, sophisticated visualizations can be constructed by the user. This provides the
flexibility to perform traditional visualization tasks as well as supporting the sometimes
unusual needs of information uncertainty visualization.
Comparison to Traditional Spreadsheets
The method for incorporating uncertainty in a traditional spreadsheet consists of three
major changes. Firstly, the uncertainty details must be recorded in the spreadsheet
54 Chapter 4. Spreadsheet Paradigm for Information Uncertainty
somewhere, resulting in additional cells being used. The addition of new cells changes
the layout of the spreadsheet and increases the amount of information that the user
faces. Furthermore, the number of cells that are added depend on the number of pa-
rameters required by the uncertainty data type. Secondly, formulae need to be changed
to incorporate the propagation of uncertainty details. These formulae become harder
to understand, because the uncertainty information handling obscures the fundamental
operation. The uncertainty information propagation must also be carried forward to
all downstream formulae, which can be many. Thirdly, any graphs or visualizations
should be updated to include uncertainty information as appropriate.
There are four limitations to traditional spreadsheets for incorporating uncertainty.
Firstly, the user is required to be intimately aware of the uncertainty modeling tech-
nique, including rules for its propagation, before they can incorporate it in their model.
Secondly, adding uncertainty information after the model is already in place becomes
an arduous task that is error prone. Should more information come to light, for which
another uncertainty modeling technique is more appropriate, then all affected parts of
the model have to be manually rebuilt. Thus, it is prohibitive to change the level of
uncertainty information after the initial design. Thirdly, changing propagation rules
requires all formulae to be rewritten. Thus, it is also prohibitive to change the math-
ematical propagation model. Fourthly, there are currently few built-in visualization
techniques for information uncertainty. The visualization techniques that are supplied
target specific uncertainty modeling types (e.g. intervals). Creating sophisticated visu-
alizations typically requires users to export their data to a more advanced visualization
system.
Our system overcomes these four limitations. In contrast to adding new cells to
the spreadsheet, our approach is to store the uncertainty information in the same cell.
The immediate advantages of this approach are that the spreadsheet does not change in
layout and the number of cells do not increase, irrespective of the type or volume of un-
certainty information it contains. Furthermore, the system is aware of this uncertainty
information and has a mechanism for resolving appropriate propagation operations in
formulae, meaning that formulae do not change. Thus, to add, change, or remove un-
certainty for a variable is a local change to a single cell. Exceptions only occur where
the user’s chosen mathematical model prohibits particular operations or combinations,
which is no different to any traditional approach. The system resolves operations using
a table of operations that the user can control at a global level. Therefore, should an al-
ternative mathematical model be needed, no change to the actual spreadsheet contents
is required.
4.6 Case Study: Financial Decision Support 55
Our system uses a flexible visualization sheet that allows sophisticated visualiza-
tions to be explored. The advantage is that any changes to the spreadsheet are immedi-
ately reflected in the visualizations. The next section illustrates these advantages when
applied to a financial model.
4.6 Case Study: Financial Decision SupportThis section describes the advantages of using our architecture over a traditional spread-
sheet. It also illustrates how an uncertainty spreadsheet is constructed and used through
a case study. The problem to be explored is understanding and visualizing the prof-
itability of a prospective investment property. Acquiring property for investment and
rental income is a common prospect for many who may not have a background in fi-
nance. However, there are many estimations and subtle interactions between variables
that can have significant effects on the profit outcomes. Furthermore, many of these
interactions are poorly understood or difficult to define, even for experts.
The decision to acquire is based on profitability of the investment. Therefore, the
output of the model is a Net Present Value (NPV) calculation that gives a comparison of
the profitability of buying a property using a deposit against investing that same deposit
into a fixed interest vehicle. A positive NPV indicates that the property investment is
more attractive.
The NPV calculation is as follows:
NPV =t
∑n=1
CashFlow(n)(1+ in)n
where t is the number of years the property is held; in is the after tax interest rate
in year n; and CashFlow is given by
CashFlow = r− p−o+ x−CI +CO−u
where r is rental income; p is the loan payment for the current year; o is the ongoing
expenses; x is the tax refund due to investment; CI is the deposit paid on purchase; CO
is the deposit + net profit on sale; and u are upfront costs.
Building the Initial Spreadsheet
If uncertainty is ignored, then the common approach to this problem is to create a tab-
ular spreadsheet: each column contains a variable and each iteration of n adds another
row. A summary page is created where input variables, such as increases in salary,
can be placed in an accessible location. The user is able to change input details and
observe their effects over a number of years, usually with the aid of graphs.
56 Chapter 4. Spreadsheet Paradigm for Information Uncertainty
Most of the variables in this model are subject to uncertainty. For example: rental
income becomes progressively less certain the farther into the future it is predicted;
loan repayments are similarly uncertain since they are dependent on a variable interest
rate; the tax refund is uncertain because it depends upon taxation law, employment
status, and promotions, all of which can change unexpectedly; and the net profit on
sale is always subject to uncertainty.
Spreadsheet Layout is Unchanged
Adding uncertainty details using our system does not add new cells or change the
spreadsheet layout. Figure 4.6 illustrates this by using the annual salary increase pro-
jected over 20 years in the prototype software. There are four variables: salary growth,
salary, tax, and net income. Uncertainty information propagates from salary increase
to salary to tax. The user wishes to model the salary increase as an interval of 7±1
(6 to 8). Figure 4.6 (a) shows the original spreadsheet model prior to modeling an
interval. Figure 4.6 (b) presents a solution using a traditional spreadsheet approach,
which requires six columns to represent three variables. Each column had to be manu-
ally added and the formula for tax and salary had to be updated to reflect this change.
Figure 4.6 (c) shows our prototype system with uncertainty hiding switched on. The
salary growth field was promoted to an interval (7±1) and no other change was made.
In this view the updated model closely reflects the original1. Figure 4.6 (d) is the
same as Figure 4.6 (c) with uncertainty hiding switched off, thus showing the same
information as is found in Figure 4.6 (b).
(a) (b)
(c) (d)
Figure 4.6: Interval Modeling Example: (a) Original Model (b) Traditional Spread-sheet (c) Prototype System Uncertainty Hidden (d) Prototype System UncertaintyShown
1Note that the number shown represents the halfway value between the upper and lower limits. Theupper limit grows more rapidly than the lower limit, thus the mean value for [6,8]%growth will notmatch 7% growth.
4.6 Case Study: Financial Decision Support 57
Formulae are Unchanged
The shaded cells indicate that they contain an interval, thus it can be seen from Fig-
ure 4.6 (b) that the uncertainty is propagated automatically to both salary and tax. The
formulae for these cells, however, are unchanged. It is noteworthy that while the figure
shows the representative value in each cell, the user can always toggle the viewing op-
tion to show the uncertainty details instead of the representative value. The traditional
approach not only changes layout, but the formulae had to be repeated to calculate both
the low and high rates.
To achieve the same effect using a traditional spreadsheet requires more effort.
Firstly, each affected variable must be expanded to two cells, namely the upper and
lower bounds. This typically involves adding an additional column for each variable
that is calculated over multiple years. Secondly, the propagation of the uncertainty
information must be manually managed by adding the appropriate formulae.
Visualization can be Abstracted from Uncertainty Type
Using the traditional spreadsheet limited the graphs to those that the program provided,
of which two could be used to indicate the intervals. The first was a graph that used
error bars, while the second was to overlay the maximum and minimum lines on the
same graph as two different data points. However, these traditional graphs only work
with interval data. In contrast, the graph in Figure 4.7 will work with the other data
types.
The graph in Figure 4.7 was generated using three elements in a visualization sheet:
a title text object, a 2D axes object, and a polygon. The 2D axes object takes as param-
eters the label and range for the vertical and horizontal axes. The polygon requires a
color specified in the first four cells, followed by a series of alternating x and y coor-
dinates. The y coordinate is given by firstly the lower bounds of the variable, then the
upper bounds, using formulae of the form “=Lower(cellref )”, where cellref is a refer-
ence to a cell containing NPV for the appropriate year. These functions are defined for
all numerical types. For example, the Upper() and Lower() functions return the same
value when that value is certain, resulting in a line graph.
Changing Uncertainty Models is Easy
The user can choose to use the modeling technique that is appropriate for the vari-
able, with little regard for how the rest of the data is modeled. The interest rates are
unlikely to change maximally and more likely to stay even. Therefore, it is desirable
to model the changes in interest rates as a probability distribution. To model this we
choose a Gaussian distribution centered on no change. Using our system, the user
58 Chapter 4. Spreadsheet Paradigm for Information Uncertainty
Figure 4.7: Using An Interval (±0.5) for Annual Change in Interest Rates Propagatesthe Uncertainty to NPV
simply promotes the annual change in interest rates to a Gaussian probability distri-
bution. As with intervals, the uncertainty will be automatically propagated through to
NPV. If multiple uncertain variables interact, our system automatically manages their
combined uncertainty information.
In contrast, the traditional spreadsheet requires more work to achieve the same
effect. Each variable now requires two cells of a different sort: the first cell to contain
the mean, and the second cell for the variance. Every formula that was previously
written to handle the intervals must now be changed to handle normal distributions,
which requires both mathematical competency as well as care to avoid introducing
errors. If multiple uncertain variables interact, then the formulae must be painstakingly
integrated.
Flexibility for Sophisticated Visualization
The ability to use multiple visual elements, and map data to those elements using
formulae, gives the user the flexibility to create sophisticated visualizations such as
Figure 4.8. This figure shows the most likely NPV against the year of sale and the
property value appreciation. The volume is actually composed by layering several
surfaces, with the certainty of NPV mapped to opacity. The color of the surface is red
if the NPV is negative, green otherwise. A wire-frame outline of the extents of the
thresholded NPV volume was added to provide context.
The information shown in Figure 4.8 could not be produced using current spread-
sheets. Firstly, flexibility of visualization was required to stack multiple surfaces with
varying color and translucency together with a wireframe outline into a single 3D
space. Secondly, the calculations that underpin the uncertainty propagation are com-
plicated enough to be prohibitive.
4.6 Case Study: Financial Decision Support 59
Figure 4.8: Volumetric Representation of the Most Likely Effect Interest Rate ChangesWill Have on NPV.
60 Chapter 4. Spreadsheet Paradigm for Information Uncertainty
CHAPTER 5
Uncertainty Encapsulation and Automated
Propagation
5.1 IntroductionThis chapter addresses issues in modeling and propagation of uncertainty. There are
numerous information uncertainty modeling techniques, each designed for different
situations. To correctly model the uncertainty for a variable, the appropriate uncer-
tainty model needs to be chosen. This uncertainty can change over time, requiring the
data to be updated. In many instances there will be multiple variables that are subject
to different types of uncertainty. These variables may be combined through an oper-
ation and their associated uncertainty must be correctly propagated. Klir’s principle
of requisite generalization [60] requires that the uncertainty models be converted to a
sufficiently general technique capable of handling the combination. While this princi-
ple may be employed ad hoc by a trained mathematician, such a task will be beyond
the capability of many classes of users.
This chapter presents a framework and software design that allows users to easily
transition between uncertainty models and facilitates automatic propagation of uncer-
tainty. Our approach is to encapsulate the uncertainty with the variable into a unit.
This approach has three significant advantages: firstly, the uncertainty models become
polymorphic, allowing the user to think in terms of variables and not modeling tech-
niques; secondly, it provides structural support for dealing with parameters of the un-
certainty, thereby avoiding common errors that arise when related parameters are sep-
arated; and thirdly, it integrates information uncertainty modeling techniques into a
62 Chapter 5. Uncertainty Encapsulation and Automated Propagation
consistent framework, which enables automated propagation methods.
The rest of this chapter is arranged as follows. Section 5.2 examines conceptualiza-
tion, categorization, and data structures of information uncertainty for a unified frame-
work. Section 5.3 builds on this framework to provide a mechanism for automated
uncertainty propagation using look-up tables. The number of entries can become large
and a strategy for dealing with this is presented, called the hierarchical heterogeneous
propagation method. Section 5.4 offers a summary of the chapter.
5.2 Unified Information Uncertainty FrameworkThis section seeks to integrate information uncertainty modeling techniques into a co-
herent framework. We take the perspective that information uncertainty modeling is
a way of improving the fidelity of the model. By adding information about the uncer-
tainty of a variable, the user is increasing the level of knowledge about that variable.
Consider the future employment growth rates that are used in Figure 3.1, which are
shown in Table 5.1. The value used in graph (a) is ignorant of potential for variance
in the predicted growth assuming its value to be certain. In graph (b), it is known that
the value is only an estimate, which is additional knowledge that was not available in
graph (a). Graph (c) adds further information: it is certain that the value will be within
specified bounds. Graph (d) adds even more information, assigning a degree of cer-
tainty to what the actual growth value will be. With each graph, more information is
shown about the future employment rates.
Graph Growth Rate Known / Assumed
(a) 0.147, certainly it is certainly 0.147(b) 0.147, estimated it is not necessarily 0.147(c) [−0.3,0.6] it is between -0.3 and 0.6(d) μ = 0.147, σ = 0.1 it is probably 0.147
Table 5.1: Predicted Growth Rates used in Figure 3.1
The remainder of this section is split into three parts. First we discuss the con-
cepulization of information uncertainty from the abstract through to detailed models.
We then categorize modeling techniques by their level of detail. Finally, we describe
the data-structures for encoding various forms of information uncertainty.
5.2.1 Conceptualizing Information Uncertainty and its UsageUncertainty exists in many problems, arising from sources such as linguistic ambigu-
ity, the uncertainty of predicting future outcomes, and errors obtained during measure-
ment. In a philosophical sense, a variable begins with total uncertainty as an unknown
variable. This variable can be assigned a name once it comes to the attention of the
5.2 Unified Information Uncertainty Framework 63
user. At this point the value of the variable is still uncertain and has an unknown value.
For example, we can declare the variable α without declaring the value of α . Since it
has unknown value, it has a theoretically infinite potential to be any value. Thus, every
value of α is both possible and, as far as we know, equally likely.
A variable will most typically be assigned a particular value, for example α = 5.
However, this does not eliminate the uncertainty, because there are three possibilities:
firstly, five can be the actual value beyond any doubt (absolute certainty); secondly, αcan be guessed to be five (uninformed estimate); or thirdly, α can be assumed to be five
but it is unknown whether there is potential to be otherwise (uncertainty ignorance).
Knowing only the value of a variable does not explicitly differentiate between these
three possibilities. However, many data models that are used in the real world stop
at this level of detail, thus discarding valuable information. Uncertainty ignorance is
usually implied, but using a single value is common for variables that have an implicit
uncertainty. For example, the future spot price of oil can be estimated at a particular
value and it is generally understood that this value is not certain.
If it is known that the value of the variable cannot be outside a particular range, then
more is known about the variable and the amount of uncertainty is therefore reduced.
For example, the interval 4.5 ≤ α ≤ 5.5 allows for a 0.5 margin of error around five
and is often written as α = [4.5,5.5]. Figure 5.1 illustrates an example progression
of α from unknown variable to interval. Each box represents the type of information
uncertainty, while arrows represent new information about the variable α . Information
about the uncertainty is referred to as uncertainty information. As more uncertainty
information is added, the information uncertainty becomes better defined.
Unknown Value
Unknown Variable
There is a variable, α
Uncertainty Ignorance
α = 5
Uninformed Estimate
5 is an estimate
Interval
4.5 ≤ α ≤ 5.5
Figure 5.1: Progression Through States of Information Uncertainty (Boxes) as a Resultof Information (Arrows)
Variables that are subject to information uncertainty can be considered to be a cloud
64 Chapter 5. Uncertainty Encapsulation and Automated Propagation
of potential values in an uncertainty space. The appropriate information uncertainty
data structure is used to parametrize the uncertainty using available knowledge. At
some point, the variable is collapsed onto the real-number line to produce an estimate.
Figure 5.2 illustrates these concepts: the information uncertainty is defined by a cloud
in uncertainty space, while a collapsed value is an estimate on the set of real numbers.
Information Uncertainty
Collapsed Value(estimate)
UncertaintySpace
Real numbers
Figure 5.2: Projection of Information Uncertainty onto an Estimate Point
A data model without uncertainty details is, in fact, a data model that deals with
already collapsed values. Any difference between the value in the data model and the
actual true value is error. The user may or may not be aware of this error. In the case
of predictions, for example, it is generally understood that there can be a divergence
between the model and the actual value. However, there are other sources of uncer-
tainty where the potential for error may not be so obvious. In particular, the potential
for error accumulates as variables are combined through mathematical operations. The
alternative approach is to incorporate uncertainty details in the model, which delays the
collapse until it is necessary. Delaying the collapse as late as possible allows the poten-
tial error to be carried throughout the model. This uncertainty can then be visualized
to gain a deeper understanding of the impacts it can have.
We now briefly cover major uncertainty modeling techniques with reference to po-
tential collapses and the uncertainty parameters. Defining α as an interval determines
the possible values that α can hold, since a potential value of α is only possible if it is
within the interval. Thus, if α is defined by an interval bounded inclusively by l and u
such that l ≤ u and l,u∈R, then any potential value of α must be an element of the set
[l,u]. Another way of expressing set membership is through a membership function,
μ(x) =
{1, l ≤ x≤ u
0, otherwise
where a value of 1 indicates inclusion and 0 indicates exclusion. The uncertainty arises
because all values x are valid collapses of the variable α so long as μ(x) = 0. Figure 5.3
graphs the membership function and all collapses for the interval [4.5,5.5].
5.2 Unified Information Uncertainty Framework 65
4 5 6
Membershipμ(x)
Real numbers
1
0
Figure 5.3: All Collapses of the Interval [4.5,5.5]
Finer grades of uncertainty specification are allowed when the range of μ is not
constrained to either 0 or 1. For example, rough sets allow three possible states: totally
an element, partially an element, and not an element; while fuzzy sets allow the range
of μ to vary over the interval [0,1]. So long as μ(x) > 0, a collapse to x is possible, as
illustrated by the graph of a fuzzy variable in Figure 5.4.
4 5 6
Membershipμ(x)
Real numbers
1
0
Figure 5.4: All Collapses of a Fuzzy Number Around 5
A graded membership allows the user to specify the degree to which potential
values should be considered a valid collapse of the variable. The advantage of doing so
is that a cut plane can be used to control the amount of compliance desired. A cut plane
is a minimum value that is required of μ in order to be considered for collapse. For
example, Figure 5.5 shows a reduced collapse set for the same variable as in Figure 5.4
that result from using a cut plane.
4 5 6
Membershipμ(x)
Real numbers
1
0
cut plane
Figure 5.5: Collapses for a Fuzzy Number Around 5, Using a Cut Plane
Probability based systems have a similar construct, called the Probability Density
66 Chapter 5. Uncertainty Encapsulation and Automated Propagation
Function (PDF). There is an additional requirement that the integral of a PDF should
be 1. Infinitesimally small likelihoods are approximately zero for practical purposes.
The various types of sets and the probability density functions can also be discrete,
in which case the membership or probability is only non-zero for a finite number of
collapses. An example is the probability density function for the sum of a throw of two
dice: the probability is only non-zero for integers from 2 to 12. All other outcomes are
impossible, including fractional outcomes.
This section described information uncertainty in terms of an uncertainty space
and its real number collapses. Several types of uncertainty modeling techniques were
reviewed with reference to this conceptualization. We now categorize these into a
coherent framework.
5.2.2 Categorization of Uncertainty ModelsThere are several modeling techniques that can be used to describe information uncer-
tainty with varying degrees of knowledge. We place these techniques into one of five
general categories: estimate, where the value is not guaranteed to be the true value;
non-specificity, where the true value is known to be one of a set of values; probabil-
ity, where the likelihood of the true value is known; membership, where the degree of
membership1 within a group or label is known; and belief, where the believability of
values is known. The categories are based on the type of knowledge that is encoded,
which is necessary for determining suitable display techniques. Table 5.2 presents
these categories together with commonly used modeling techniques.
5.2.2.1 Known Value
The known value types indicate that the value is known to be correct. The fact that this
value is known to be correct is information about the uncertainty, namely that there is
no uncertainty. While traditional techniques (scalar values, etc) are often assumed to
be known values, we reserve this category for value types that specifically hold values
that are known to be exact. We consider traditional techniques to represent uncertainty
ignorance, since it is not explicitly stated whether the value is true or not.
5.2.2.2 Estimation
In the context of this categorization system the term estimation refers to a reasonable
guess at the value of a variable. The estimation category describes modeling techniques
where the data is known to be uncertain, but no parameters of the uncertainty are
encoded. For example, the predicted gross domestic product for a country can be given
1Standard sets do not employ a notion of partial membership and are thus part of the non-specificitycategory.
5.2 Unified Information Uncertainty Framework 67
Category What is known Example ModelingTechniques
Known Value The value is accurate traditional scalarEstimation The value is not accurate traditional scalars,
vectorsNon-specificity The value is one of known
alternativesintervals, sets
Probability The likelihood of eachpotential value
probabilitydistributions
PartialMembership
The degree of belonging toeach set for each potential
value
rough sets, fuzzy sets
Belief Availability of evidence Dempster-Shaefercalculus, Bayesian
probabilities
Table 5.2: Categories of Information Uncertainty Modeling Techniques
as a number. It is understood that this might not be the exact future gross domestic
product, but the variable does not provide any further details about this uncertainty.
For this reason, traditional non-uncertainty modeling technique encodings, such
as floating point numbers, can act as estimations. Estimations provide a heuristic ap-
proach to uncertainty, allowing each user to interpret its meaning according to their
own understanding.
5.2.2.3 Non-specificity
Non-specificity refers to methods that describe a realm of possible matches, but are
non-specific about which match it is. An example of a problem suitable to non-
specificity is errors in measurement, where the real value is considered to be one of
a possible range.
The basic modeling technique underlying non-specificity is the mathematical set.
Such a set can be a discrete set, such as the set of possible conditions as diagnosed by
a doctor; or a continuous set, such as the interval covering the measurement error.
Non-specificity states that one of the alternatives is assumed to be the case, but
does not express a preference for which one, nor does it allow for partial matches. For
visualization this poses a challenge to avoid misinterpretation as to the likelihood of
alternative possibilities. However, non-specificity models can be considered to imply
the degree of uncertainty by the number of alternatives offered. As the number of
possible alternatives reduces, the uncertainty of each member also reduces until the
limit of no uncertainty, which occurs where there is only one possibility.
68 Chapter 5. Uncertainty Encapsulation and Automated Propagation
5.2.2.4 Probability
The class of modeling techniques that detail the level of likelihood for positive matches
use probability. Probabilities are derived by determining the proportion of inputs that
are likely to produce positive matches, given random samples. An example of a prob-
lem suitable to probability is a variable whose prior values are known, from which
a picture of likely future values is built. The universe of discourse is described as a
joint probability, which provides an expectation of every possible outcome. An ex-
ample from the medical domain is the probability of a particular disease, given the
socio-demographic profile of the patient. This probability was derived from prior ob-
servations of the population.
Probability differs from non-specificity because it quantifies the expectation for
each match, but is similar to non-specificity because it assumes only a single positive
outcome. Although the probability of an outcome can appear insignificantly low, it is
still possible. Depending on the user objective during visualization, attention may be
required to ensure that low probabilities are not masked by more likely options.
5.2.2.5 Partial Membership
The partial membership category of information uncertainty modeling techniques en-
code a degree of membership, allowing for partial membership in multiple sets. Com-
monly, degree of membership modeling techniques are used in situations where the
source of the uncertainty arises from vagueness and they are particularly suited to nat-
ural language problems. An example problem for which the degrees of membership
are suitable is where human interpretations need to be encoded as variables. Another
use for degree of membership is simplification of complicated systems: Systems of
many rules and exceptions can often be replaced with simpler, more elastic, rules that
observe partial membership conditions.
Information modeling techniques that express degrees of membership include rough
sets and fuzzy sets, both of which have accompanying novel definitions for logical op-
erations [89, 79, 77]. An example in the medical scenario would be a situation where
the medical record states that “there were strong signs of epidermal scarring”. In this
example the degree of membership in the “epidermal scarring” set would be strong,
which may translate into a 0.9 degree of membership. The degree of membership in
the set of “no epidermal scarring” would be weak, or 0.1.
Unlike non-specificity and probability, which describe a single positive match, de-
gree of membership approaches allow the same variable to be treated as a positive
match in several sets simultaneously. The greater the degree of membership within a
set, the more certainly the variable can be treated as a member of that set. Unlike non-
specificity, where the uncertainty is implied by the number of alternatives, the amount
5.2 Unified Information Uncertainty Framework 69
of uncertainty is directly encoded by the degree of membership.
5.2.2.6 Belief
The final category of information uncertainty modeling techniques deals with belief.
Degrees of belief are used in evidence-based reasoning systems, such as expert sys-
tems. An example of a problem suitable to degrees of belief is a situation where the
veracity of the outcome must be supported by evidence.
Techniques for operating on degrees of belief include Bayesian inference, which
uses probabilities to model belief. Other technqieus include Dempster-Shaefer calcu-
lus (see [107]). Belief is similar to probability, because the uncertainty of a positive
match is encoded. Degrees of belief are unlike probabilities, because they are not
based on knowledge of the universe of discourse, but give an indication of certainty
that a positive match exists based on available evidence. Sometimes this evidence is
subjective.
An example in the medical domain is the degree of belief that a recently discovered
disease is contagious. Given the contagiousness of all known diseases, the probability
of a new disease being contagious might be 0.25; however, without any evidence to
suggest that the new disease is contagious, the degree of belief that the new disease is
contagious is zero.
For visualization purposes, degrees of belief can be treated similarly to probabil-
ity. However, users may more often be concerned by exceptional or poorly supported
matches.
5.2.3 Data Structures for Information UncertaintyThis section outlines data structures for recording uncertainty information about vari-
ables. These data structures hold parameters that describe the uncertainty space, from
which the possible collapses can be derived. Common information uncertainty model-
ing techniques are listed in Table 5.3. Items in italics indicate abstract types.
The data structure descriptions given here assume that instanciated data types can
be distinguished at runtime. If the environment does not support this feature, then the
data structures would need to be extended to include a type identifier.
The Quantity type holds a single real number and indicates that a variable that has
uncertainty ignorance. All data structures listed here inherit from the Quantity type,
since they can be considered to be refinements on uncertainty ignorance. The absolute
certainty and unsubstantiated estimate types are derivatives of Quantity that do not add
further parameters, so long as the environment can distinguish between them.
The discrete classical set type stores a set of potential values. A single value from
the set is chosen to be the representative value. It is invalid for the representative value
70 Chapter 5. Uncertainty Encapsulation and Automated Propagation
Type Description
Quantity (Uncertainty Ignorance) Ignore the potential existence ofuncertainty
Absolute Certainty No uncertaintyUnsubstantiated Estimate An estimate or “guess”Confidence Singular valued graded possibilityDiscrete Classical Set Set of possibilitiesInterval Convex continuous setSet of Intervals Arbitrary continuous set (often
non-convex)Discrete Rough Set Discrete, 3 gradesContinuous Rough Set Set with three grades of possibilityDiscrete Fuzzy Set Discrete set with infinite grades of
possibilityContinuous Fuzzy Set Continuous set with infinite grades of
possibilityLinearly Defined Fuzzy Set Fuzzy set defined by a sequence of
line segmentsDiscrete Probability Distribution Discrete probability distribution
(PMF)Continuous Probability Distribution Continuous probability distributionLinearly Defined ProbabilityDistribution
Continuous distribution defined by asequence of line segments
Uniform Continuous ProbabilityDistribution
Uniform distribution over a boundedinterval
Gaussian Probability Distribution Normal distribution defined by a meanand variance
Table 5.3: Common Information Uncertainty Modeling Types
5.2 Unified Information Uncertainty Framework 71
to not be a member of the set (see the principle of representation in Section 4.3.1). The
set is discrete and itemizes every possible collapse.
The interval type stores a continuous set of values that are bounded by upper and
lower value. Each boundary can either be inclusive, which means that the boundary
value is an element of the set, or exclusive, in which case the boundary is not included.
The representative value defaults to the mid point between the boundaries. The interval
data structure is illustrated in Figure 5.6.
Real numbers
lower bound upper bound
representative value
representative value: reallower bound → { value: real, inclusive: boolean }upper bound → { value: real, inclusive: boolean }
Figure 5.6: The Interval Data Structure
The set of intervals type is used to describe arbitrary continuous sets. This data
structure is required to allow a continuous set that is not convex, since intervals are
necessarily convex. The representative value can be chosen by the user, so long as it
adheres to the principle of representation.
The continuous rough set type is used to describe an arbitrary rough set. Rough
sets allow three grades of possibility: elements are completely within the set, partially
within the set, or not within the set, which we map to membership values of 1, 0.5,
and 0, respectively. The continuous rough set data structure consists of a sequence
of membership change markers, which indicate the points at which the membership
function changes. The markers consist of two pieces of information: firstly, the point
at which the membership function changes; and secondly, what the new value of the
membership function from that point until the next marker. The final marker indicates
values from that point until positive infinity, while the membership for values from
negative infinity to the lowest marker is specified in a separate field called priorμ .
Figure 5.7 illustrates the rough set data structure and its membership function. The
priorμ is zero, thus the membership function is zero between −∞ and the first point.
The first change in the function is at 2, as defined by the first marker in the sequence.
Each marker signals a change in membership and the last marker sets the membership
value from there to ∞. Although the representative value can be any value that has a
non-zero membership, it is usual for the representative value to have unit membership.
The discrete rough set type is similar to the possibility set, except that each element
additionally specifies a degree of membership, {value,membership}. The degree of
membership must be 1, 0.5, or 0.
72 Chapter 5. Uncertainty Encapsulation and Automated Propagation
Membershipμ(x)
1
0
2 5 11 12
priorμ: 0markers: { {2,0.5}, {5,1}, {11,0.5}, {12,0} }
Figure 5.7: Definition of Continuous Rough Set Using a Marker Sequence
The linearly defined fuzzy set type models a fuzzy set whose membership function
is defined by a sequence of connected line segments. Similarly to the continuous rough
set, a sequence of markers is used. However, the markers now indicate the vertices
between line segments. The value of the first marker is assumed to extend to −∞ and
the value of the last marker extends to ∞. The membership function for values between
markers is a linear interpolation of the two closest markers. Figure 5.8 illustrates the
membership function and data for a fuzzy set. It is usual for the representative value to
have maximum membership.
Membershipμ(x)
1
0
2 5 11 12
markers: { {2,0}, {5,1}, {11,1}, {12,0} }
Figure 5.8: Definition of Linearly Defined Fuzzy Set Using a Marker Sequence
The discrete fuzzy set type is similar to the discrete rough set, except that the de-
gree of membership can be any value from 0 to 1. Again, the representative value
usually has the highest membership value. Similarly, the discrete probability distri-
bution stores a finite set of {value, probability} combinations. While the range of the
probability components is [0,1], their sum must equal 1. The linearly defined prob-
ability distribution is the probability version of the linearly defined fuzzy set and is
implemented using the same mechanism. This data structure provides the freedom to
approximate almost any distribution by sampling it. However, this approximation may
not be sufficiently accurate for certain applications.
Two common probability distributions are the uniform continuous probability dis-
tribution and the Gaussian probability distribution (also referred to as the normal dis-
tribution). The uniform distribution is defined by the bounds of an interval and all
5.3 Automated Propagation of Information Uncertainty 73
values within this range are equally likely. The evaluation of the PDF for a uni-
form distribution is f (x;a,b) =
⎧⎨⎩
1b−a f ora≤ x≤ b
0 otherwise, where a and b define the in-
terval. The Gaussian distribution also only requires two parameters: the mean value,
μ , and the standard deviation, σ . The evaluation of the PDF for x is f (x;μ,σ) =1
σ√
2π exp(− (x−μ)2
2σ2
).
There exist several well-known probability distributions that are evaluated analyt-
ically. This include the beta distribution, which is supported on the bounded interval
[0,1], and for which the PDF is evaluated as f (x;α,β ) = xα−1(1−x)β−1∫ 10 μα−1(1−x)β−1du
. Other
examples are the Laplace distribution, the exponential distribution, and the Cauchy-
Lorentz distribution. Interested readers are directed to probability texts such as [111].
5.3 Automated Propagation of Information UncertaintyUncertainty needs to be propagated to the result whenever an uncertain variable is in-
volved in an operation. Thus, to support arithmetic for information uncertainty requires
that methods be defined to propagate the uncertainty correctly. Current analytical un-
certainty methods require propagation to be carried out manually. This is cumbersome
and makes it easy to introduce avoidable errors.
This section is devoted to automatic propagation of uncertainty. Two aspects are
covered. Firstly, a mechanism for supporting automated propagation is presented,
called the uncertainty propagation model. This mechanism is configurable and exten-
sible, but it requires rules to be defined for different combinations of variable types and
these can become numerous. The second part presents the hierarchical heterogeneous
propagation, which provides a means for resolving a default course of action. This
reduces the complexity of the propagation model as fewer rules need to be defined.
5.3.1 Uncertainty Propagation ModelIn a typical spreadsheet system there are several types of operators ranging from ba-
sic arithmetic through to pseudo-random number generators. For the purpose of this
section, an operation refers to indivisible units of work. It has a single output and op-
tionally multiple inputs. In order to provide automated uncertainty propagation, most
of these operators will require extensions to deal with information uncertainty. Partic-
ularly, operators that take at least one variable as their input will likely propagate the
uncertainty from the variable (or variables) to their output.
New information uncertainty modeling techniques might be added to the system.
For example, someone may wish to implement a beta distribution type. It is therefore
74 Chapter 5. Uncertainty Encapsulation and Automated Propagation
necessary to have a mechanism to add appropriate propagation methods as well. Fur-
thermore, some users and domains may have task-specific propagation requirements.
This means that the user will wish to swap out certain propagation methods for alterna-
tive implementations. For example, in a risk assessment application the user may wish
to employ a principle of maximal uncertainty, whereas a remote sensing user may find
a principle of minimal uncertainty to be more appropriate.
There are three requirements for the automated propagation system. Firstly, it must
be extensible, allowing new propagation methods may be added in future. Secondly,
it must be able to resolve which method to employ in the face of multiple alternatives.
Thirdly, it should be configurable by the user: The user will want to choose which
methods they want to use, and they may change their mind.
Most commonly used operators in a spreadsheet system will be either unary (taking
one input) or binary (taking two inputs). The propagation model described here is gen-
eralized to handle any n-ary operator. This includes n = 0, which takes no parameters.
We define the Uncertainty Propagation Model (UPM), which uniquely maps an
operator-operand signature (sig) to a Uncertainty Propagation Method (method). The
sig consists of the operator identifier (e.g. “+” for addition) and the types of operands.
The propagation method is a handler that takes the operands and returns a result. Thus,
UPM : sig→ method
where sig = {String, Type, Type, ...} and method : {C1, C2, ...} → C, where C is an
object encapsulating an information unit and its associated uncertainty details. For ex-
ample, consider the following equation: x = 8+∼ 5, where∼ 5 means “approximately
five”. The signature becomes {”+ ”, Quantity, Estimate}, indicating the addition of a
quantity to an estimate type. The propagation model maps this signature to a method
that can handle addition between quantity and estimate types. The parameters to the
method are {8,∼ 5} and it returns ∼ 13, which means “approximately thirteen”.
When a sig cannot be found in the propagation model, the operation is illegal and
the result is an error. Typically, an error condition is handled by returning a special
error method, which takes any arguments and returns an error as the result. A sig
may also map to nil, explicitly indicating that no conversion is permitted and attempts
to do so will result in an error. This enables users to deliberately disallow particular
combinations.
5.3.2 Hierarchical Heterogeneous PropagationThe number of entries in an uncertainty propagation model will grow exponentially
as new data types are added to the system. Adding methods can therefore become
5.3 Automated Propagation of Information Uncertainty 75
a significant task. Hierarchical heterogeneous propagation helps to overcome this is-
sue by searching for a suitable combination using existing methods. Parameters are
implicitly converted into compatible types. This requires that the uncertainty model-
ing techniques be integrated into a coherent framework such that the uncertainty can
be captured using interchangeable models. This is similar to the aims of generalized
information theory (see [60]).
The principle behind hierarchical heterogeneous propagation is to arrange uncer-
tainty modeling techniques in a hierarchical order of increasing uncertainty detail. We
group these techniques into three strata: the top tier being the crisp strata; the middle
tier being the bounded strata; and the lower tier being the explicit strata. Categories
of the information uncertainty modeling techniques and their stratification is shown
in Figure 5.9. The crisp strata includes the singular valued types, which are known
value, uncertainty ignorance, and estimate. These types specify a singular value and
optionally describe the presence or otherwise of uncertainty, but do not specify further
details about any potential uncertainty. The bounded strata includes classical set based
uncertainty types, which are classical sets, intervals, and sets of intervals. These types
specify the boundary between the possible collapse values and those that are not con-
sidered possible. The explicit strata includes types that explicitly specify the degree of
uncertainty for potential values, such as rough sets, fuzzy sets, and probabilities.
crisp
bounded
explicit
estimate
non-specificity
probability membership belief
increasinguncertainty
detail
uncertainty ignorance
known value
Figure 5.9: The information uncertainty modeling techniques sorted into three strata
Estimates belong in the crisp strata, because estimates represent values that are
treated as though they were the true value. No information about the uncertainty is
known about an estimate, beyond its existence. In the bounded strata, the modeling
techniques encode the boundaries between what is possible and what is not. For exam-
ple, the accuracy of a measurement device can be specified as ±10 units. This means
that it is certain that the measurement is within 10 units (assuming the device is work-
ing correctly), however, nothing is stated about the uncertainty of the possible values.
Put another way, it is not stated whether +1 is more, less, or equally certain than +2.
76 Chapter 5. Uncertainty Encapsulation and Automated Propagation
The explicit strata contains modeling techniques that explicitly state the degree of un-
certainty of all candidate values. For example, with reference to the probability in
Table 5.1 (d), it is more uncertain that the predicted employment growth rate in Cali-
fornia will be -0.3 than 0.147.
Each strata provides increasing detail about the uncertainty over the previous one.
As users refine their model, they can progress downward along the strata with the ad-
dition of more information. Thus, a variable may begin as an estimate, then be refined
to an interval once the extents are known, then be refined further into a probability
distribution as the full likelihood becomes known. This refinement is implied by the
arrows in Figure 5.9. The reverse operation is also possible where, by removing or
simplifying information, the variable can be modeled using a technique that is further
k up the level of detail tree. Figure 5.10 shows an example of how a variable might
proceed to be refined. The left hand path shows an example of a convex variable, while
the right hand path shows a non-convex example. Convexity is defined by continuity
of the collapses.
Single-valued VariableInformation: Assumed TotalUncertainty: Assumed None
Convex Set (Interval)Information: Bounded RangeUncertainty: Non-specificity
Classic Set (Non-convex)Information: Bounded RangeUncertainty: Non-specificity
Graded Convex PossibilityInformation: Graded PossibilityUncertainty: Graded Non-specificity
Graded Arbitrary PossibilityInformation: Graded PossibilityUncertainty: Graded Non-specificity
(rough set is a specific case)
Convex Probability DistributionInformation: Probability DistributionUncertainty: Strife of Outcome
Arbitrary ProbabilityInformation: Probability DistributionUncertainty: Strife of Outcome
Imprecise Probability (Convex)Information: Probability, IntervalsUncertainty: Probability & Outcome
Arbitrary Imprecise ProbabilityInformation: Probability, IntervalsUncertainty: Probability & Outcome
Figure 5.10: Example of Increasing Levels of Uncertainty Information
Adding uncertainty information to a variable is called promotion and removing de-
tail is called demotion. A change in the uncertainty modeling technique of a variable
that does not change its level of detail is a conversion. The hierarchical heterogeneous
5.3 Automated Propagation of Information Uncertainty 77
propagation mechanism uses a combination of promotions, demotions, and conver-
sions until it can find a suitable signature in the propagation method.
Promotion, demotion, and conversion is performed using casting operators. These
operators take a single operand and return an approximately equivalent representation
using different modeling type. These operators are part of the propagation model, such
that P ⊆ PM, D ⊆ PM, N ⊆ PM, and P∩D = P∩N = D∩N = /0, where P is the set
of promoters, D is the set of demoters, and N is the set of converters.
A mapping to nil in the propagation model indicates a combination that the user
wishes to forbid. In this case the hierarchical heterogeneous propagation search not to
be invoked and the operation will still be invalid. Once the hierarchical heterogeneous
propagation search has been invoked, it will ignore all nil entries.
Two directed graphs are built: the promotion graph from P∪N and the demotion
graph from D∪N. The vertices are the types of uncertainty modeling techniques,
and the edges are the casting operators. These may be disconnected graphs, should
the user or propagation model wish to exclude certain combinations. The hierarchical
heterogeneous propagation algorithm is a path search to find a suitable combination of
parameter types for which a (non-nil) mapping in the propagation model can be found.
Should such a combination not be found, then the operation results in an error.
The graphs are walked for each parameter, minimizing a cost function. We choose
the cost function to be the number of edges in the path. Alternative cost functions
require weightings or other annotations to be added to the graphs and this is beyond
the scope of this project. We use a parameter, called the favored direction, which
dictates whether the promotion or demotion graph is walked first. It is also possible
to use the complete casting graph (i.e. the graph of P∪D∪N), however this does not
provide control over the favored direction.
To illustrate, consider a system that supports three data types: quantities, estimates,
and intervals. In this system the following promotion operators have been defined:
quantity→estimate and estimate→interval. The following demotion operators have
been defined: interval→estimate and estimate→quantity. The promotion graph will
be quantity→estimate→interval and the demotion graph will be interval → estimate
→ quantity (see Figure 5.11). For the purpose of this example, the propagation model
only defines arithmetic between parameters of the same data type and the hierarchical
heterogeneous propagation method is promotion favoring. To add a quantity to an
interval will therefore find a solution that converts the quantity parameter to an interval,
using two edge hops. Thus, the system will promote the quantity to an estimate, then
to an interval, then perform the interval arithmetic.
To illustrate the previous example using numbers, consider x = 5 and y = [2,3].Then,
78 Chapter 5. Uncertainty Encapsulation and Automated Propagation
Q u a n t i t y
E s t i m a t e
In te r va l
p r o m o t i o nd i r e c t i o n
d e m o t i o nd i r e c t i o n
Figure 5.11: Sample Promotion/Demotion Graph
x+ y = 5+[2,3] , incompatible
→∼ 5+[2,3] , incompatible
→ [5,5]+ [2,3] , success
Suppose that the promotion from estimate to interval was not defined, i.e. the pro-
motion graph consists only of quantity→estimate. In this case the promotion graph will
not yield a result and the hierarchical heterogeneous propagation search will therefore
fall back on the demotion graph. The solution of demoting the interval parameter to a
quantity is found and standard arithmetic is employed.
The hierarchical heterogeneous propagation mechanism makes propagation mod-
els simpler and easier to manage. It will produce the best results for propagation mod-
els that have well defined casting operators. However, there is less control over the
accuracy of the results as it depends on the casting operators to maintain the semantic
meaning of the uncertainty information, which may not always be reliable.
Depending on the complexity of the uncertainty modeling hierarchy and the fre-
quency of calls, the searches may take up noticeable computing resources. In practice
this can be alleviated by either caching results, or by using hierarchical heterogeneous
propagation offline to generate a more fully defined propagation model. This offline
process invovles inserting propagation methods that incorporate the conversion pro-
cess.
5.4 SummaryThis chapter described the integration of information uncertainty modeling techniques
into a unified information uncertainty framework and methods for automating the prop-
agation of uncertainty information.
The encapsulation approach to information uncertainty intrinsically associates in-
formation and its uncertainty. This ensures that uncertainty parameters do not become
separated; either from each other or from the related unit of information. The sys-
tem can now manage the uncertainty information, since it is intrinsically aware of it.
The unified information uncertainty framework, which details the conceptual relation-
ships between different modeling types and facilitates the formulation of strategies for
conversion between types.
5.4 Summary 79
An uncertainty propagation model is a means for automating uncertainty propaga-
tion: it catalogs methods for handling different combinations of uncertainty modeling
types under different operations. Where the propagation model does not explicitly
define an operation for a particular combination of uncertainty modeling techniques,
the hierarchical heterogeneous propagation mechanism can be used. Hierarchical het-
erogeneous propagation uses a hierarchical structure to implicitly provide methods for
propagation. This has the effect of simplifying the propagation model and increasing
the robustness of the propagation mechanism.
80 Chapter 5. Uncertainty Encapsulation and Automated Propagation
CHAPTER 6
Uncertainty Abstraction for Visualization
6.1 Motivation and ObjectivesInformation uncertainty visualization is made difficult because of a diversity in mod-
eling techniques. Visualization techniques tend to be created for particular types,
which makes it challenging to subsequently change the uncertainty information. It
would therefore be beneficial to have a coherent visualization framework that is ca-
pable of simplifying the visualization technique selection process while avoiding data
type lock-in. This chapter presents two components that make up such a framework.
Firstly, a user-objectives approach to information uncertainty visualization aids users
in selecting visualization techniques according to their visualization aims. The sec-
ond contribution consists of uncertainty abstraction models, which enable visualization
techniques to built in a data type independent manner.
It is commonly recognized that formal approaches to visualization are required to
avoid the disadvantages of ad hoc solutions. Visualization has traditionally followed
a data-driven approach (e.g. [15, 16, 115]), where the nature and structure of the data
set forms the basis of visualization choices. This approach provides a coherent and
structured framework, but it does not take into account the needs of tasks or users.
More recent work has taken a task-driven approach, which address the specific re-
quirements of application domains by guiding visualization based on the task at hand
(e.g. [11, 57, 82]). This approach better reflects the needs of the user, but is not suffi-
ciently generic: The techniques may work very well in one domain, but might not be
transferable to other domains unmodified.
82 Chapter 6. Uncertainty Abstraction for Visualization
We take a user-objectives approach, which aims to reduce the limitations of fo-
cusing on specific tasks, while simultaneously providing a coherent framework. An
objective represents a goal that a user aims to achieve. Objectives are at a more ab-
stract and generic level than tasks, making it possible to create a coherent visualisation
framework that is generic while still focusing on user needs.
There is a lack of consistency between current information uncertainty visualiza-
tion techniques, since they are typically developed for particular data structures. Un-
certainty abstraction models provide a consistent interface between the visualization
technique and the uncertainty modeling technique. These can be used to overcome
this issue and therefore offer a broader base of visualization techniques from which to
choose. These Three uncertainty abstraction models are described here, the Unified
Uncertainty Model, the Dual Uncertainty Model, and the Quad Uncertainty Model.
Each provides differing levels of discrimination between alternative views of uncer-
tainty. These models can also be composed recursively.
This chapter is organized as follows. Section 6.2 describes the user-objectives for
information uncertainty visualization. A computer-aided selection algorithm is pre-
sented to show how user-objectives can be incorporated into a visualization system.
Section 6.3 details the uncertainty abstraction models, including theory, notation, de-
sign, and use. Two case studies follow. The first, in Section 6.4, examines how user-
objectives can be used to visualize uncertainty for financial decision support. The
second examines simplification of business process specifications using the relevancy
objective and is presented in Section 6.5.
6.2 User-objectives for Information Uncertainty Visu-
alizationOur work has identified five visualization user-objectives that relate to information un-
certainty. These are possibility, reliability, relevancy, uncertainty structure discovery,
and ignore uncertainty. Each objective relates to a specific type of insight that the user
is seeking. Table 6.1 illustrates the essential differences between the visualization ob-
jectives by listing an English language query that characterizes questions suitable for
each objective along with an example of where the objective would be used.
Visualization techniques transform information into visual elements. Figure 6.1
provides a visual illustration of the conceptual differences between each of the ob-
jectives. The graphs indicate the relationship between the visibility and the level of
uncertainty. Visibility refers to the degree to which the viewer’s attention is drawn to a
feature.
• The possibility objective makes no distinction between degrees of uncertainty,
6.2 User-objectives for Information Uncertainty Visualization 83
Objective Characteristic Query Example of Use
Possibility “What is possible?”or“To what extent could I bewrong?”
Choosing from or comparinga number of alternativeoptions
Reliability “What is most likely?”or“What is most certain?”
Decision support, with degreeof confidence
UncertaintyStructureDiscovery
“How certain or uncertain isthis?”or“What is the structure of theuncertainty?”
Evaluation of theuncertainties involved in aparticular scenario
Relevance “What is relevant?” Simplification of acomplicated system orintelligent linked views
“Ignore Un-certainty”
“What if there were nouncertainty?”
Situations where theuncertainty is small enough tobe tolerable.
Table 6.1: Information Uncertainty Visualization Objectives
provided it is non-zero. In other words, as long as an event is possible, it is
shown to the user.
• The reliability objective gives higher visibility to information that is more cer-
tain.
• The structure discovery objective brings different levels of uncertainty to the
attention of the user, to provide insight into the structure of the uncertainty.
• The relevancy objective shows only information with a high certainty of rele-
vance to the user’s task.
6.2.1 Analysis of User-Objectives6.2.1.1 Possibility Objective
The possibility objective is motivated by the characteristics of the non-specificity class
of uncertainty modeling methods and answers the question “what are competing pos-
sibilities?”. The purpose for the user is to gain an overview perspective and be aware
of the various possibilities. Often the user is concerned by the range of alternatives to
a particular interpretation of information.
The possibilities may be either discrete or continuous. Discrete possibilities rep-
resent a finite number of distinct alternatives. An example of a discrete possibility
84 Chapter 6. Uncertainty Abstraction for Visualization
Possibility Objective Reliability Objective
Structure Discovery Objective Relevancy Objective
Figure 6.1: Graphs illustrating the Visual Treatment of Information with Variable De-grees of Uncertainty under Different Objectives.
visualization is a graph of several potential profit predictions within the same graph.
Continuous possibilities are commonly used to model error, where the true value is one
of a continuous range of possible values. An example of visualization using continuous
possibilities is a 3D volume model in medical imaging applications where the extent
of the potential error introduced during the scanning process is made explicit by using
translucent regions (this example can be seen in [51, pp.9]). The possibility objective
is the same for all information uncertainty modeling techniques: provided the degree
of certainty is not zero, the information is considered a possibility.
6.2.1.2 Reliability Objective
Reliability-based visualization expresses the degree of confidence in information. In
this case, the user wishes to understand the extent to which they can rely on the infor-
mation. This visualization objective mirrors the minimal uncertainty principle, where
the user is most interested in information with low uncertainty.
Unlike the possibility objective, where no explicit attention is given to the degree
of the uncertainty, the reliability objective requires this information to be available.
As such, variables using a non-specificity class of information uncertainty modeling
technique are not strictly suitable for reliability objectives.
An example of reliability visualization is a graph of projected profits, where the
projections are more translucent for regions that are less likely. More than one source
of uncertainty can be visualized concurrently by mapping them to different visual fea-
tures.
6.2 User-objectives for Information Uncertainty Visualization 85
6.2.1.3 Uncertainty Structure Discovery Objective
The structure discovery objective seeks exposition of the uncertainty itself. The pur-
pose is to draw attention to the uncertainty within the information to better understand
its structure.
Similar to the reliability objective, the degree of uncertainty must be present within
the information uncertainty modeling technique. The degree of uncertainty is mapped
to visual features in order to distinguish information based on its uncertainty. Unlike
the reliability objective, the user is not mainly concerned with the most certain infor-
mation. To illustrate, the purpose of the visualization may be to highlight regions of
high uncertainty. One example that draws attention to high uncertainty is a volume
visualization that uses the degree of uncertainty to determine opacity, with regions of
higher uncertainty being more opaque [25].
An example of a situation with a structure discovery objective is where the user
wishes to understand the uncertainty associated with the predicted outcomes should
they proceed with in a particular decision. The visualization could be a graph of pro-
jected sales with the degree of uncertainty being color-coded. The user may then look
at the graph and ask the question “how confident are we about making x sales?” and
the answer will be given by the color of the predicted sales at x.
6.2.1.4 Relevancy Objective
Relevancy visualization displays information that is relevant to the task of the user.
These techniques necessarily contain introduced error, since the determination of the
degree to which information is relevant to the user is itself uncertain. Additionally, the
primary use for relevancy visualization is to reduce the amount of information in the
visualization. Reduction adds a degree of uncertainty due to its non-reversible nature,
where multiple variations of original data sets can result in the same reduced set.
Since visualizations for the relevancy objective order features according to their
relevance to the user’s task, the degree of relevancy must be determined for each fea-
ture. To achieve this, a criterion function is defined that maps features to their degree
of relevancy [110]. Different user tasks will have different criteria for what constitutes
relevant information and constructing a suitable criterion function will typically be task
specific.
Although uncertainty is introduced during the visualization process, this does not
preclude investigation of uncertainty contained in the data model. Where there is un-
certainty in the data model, the criterion function must consider this. An example
where the criterion function depends on the uncertainty in the data model is where the
user is seeking to identify areas for improving measurement accuracy, giving variables
with a high amount of potential error greater relevance.
86 Chapter 6. Uncertainty Abstraction for Visualization
The objective of the user is to see only the information that is of sufficient relevance
to a task. Less relevant information is either aggregated until it becomes relevant or
discarded entirely.
6.2.1.5 Ignore Uncertainty Objective
Sometimes the user will want to ignore uncertainty for the purposes of visualization.
This is effectively the degenerate case of uncertainty visualization, where standard
techniques are used and the uncertainty information is excluded.
6.2.2 A Computer Assisted User-Objectives Selection MethodThe user-objectives can be applied manually, as was done in the case study of Sec-
tion 6.4. However, there exists repetition and commonality that can be exploited to
create an automated elicitation process. Automated user-objectives elicitation can be
used to augment the visual mapping process. For example, a computer guided system
that presents the user with a list of available variables can be extended to include an
interactive objectives selection tool. This tool asks the user a series of questions and
then offers suggested objectives. The interactive tool is outlined in Algorithm 1, where
Q1 through Q5 refer to the questions in Table 6.2.
Questions in the decision tree:
1. “Is the query about all possibilities (including highly uncertain information)?”
2. “Is the query about what is most “likely” or “sure”?”
(a) “Is it more important to show “how sure” or “only the most sure”?”
3. “Is the query about the level of uncertainty in the variable itself?”
4. “Is the query a grouping/filtering/processing of information according to subjec-tive or otherwise uncertain measure?”
5. “Is it necessary to reduce the clutter (simplify the visualization)?”
Table 6.2: Questions Used to Elicit the User-Objective
Once the objectives have been specified, the visual mapping software can use this
information to help guide visualization choices. For example, if the user has a pos-
sibility objective for a particular variable, then the visualization system will propose
adding geometry to expose all possible values for that variable. On the other hand,
if it were a structure discovery objective, then the visualization system will propose a
pseudo-color mapping to highlight the different degrees of certainty.
6.2 User-objectives for Information Uncertainty Visualization 87
Algorithm 1 User-objective Selection Method Algorithmpresent user with list of variablesuser chooses the variables they wish to include in visualizationfor each variable do
o← {}if Q1 is true then
o← o ∪ { possibility }end ifif Q2 is true then
if Q2a is first option theno← o ∪ { reliability }
else if Q2a is second option theno← o ∪ { ignore }
elseo← o ∪ { reliability, ignore }
end ifend ifif Q3 is true then
o← o ∪ { structure }end ifif Q4 is true then
o← o ∪ { relevancy }end ifif Q5 is true then
o← o \ { possibility, structure }end ifif o = then {}
o← { ignore }end if
end forGroup variables with similar objectivesPresent groupings to user in a list.if only one objective then
automatically select itelse
allow the user to specify which ones (e.g. multi-select)end if
88 Chapter 6. Uncertainty Abstraction for Visualization
The objectives-selection selection method can be used with adaptive visualization
systems. For example, the visualization task network [11] can be extended to account
for objectives by learning different weights for each objective. The objectives form the
top level of the new visualization task network, following which are domain-specific
tasks, followed by visualization techniques, and then the visual feature mappings.
In summary, the user-objectives approach to information uncertainty visualization
offers a mechanism for aiding the user to select visualization choices. Objectives rep-
resent the type of insight that the user is aiming to gain from their visualizations. The
advantage of this approach is that the visualization requirements are not driven by the
data type, reducing data type dependency. Furthermore, the objectives approach avoids
issues of domain specificity that typically result from task-driven approaches.
6.3 Uncertainty Abstraction ModelsThe user-objectives approach aids users designing their visualization, however, it does
not solve the data type depnedency issue for implementors of visualization techniques.
Visualization techniques require a mechanism that enables them to work across differ-
ent uncertainty modeling techniques. This capability can be provided through the use
of uncertainty abstraction models, which specify an interface between visualization
techniques and information uncertainty modeling techniques.
This section investigates different abstraction models for information uncertainty
that are designed for visualization. Three abstract uncertainty models are described:
the Unified Uncertainty Model (UUM), which is the simplest of the three; the Dual
Uncertainty Model (DUM), which differentiates between possibility and probability;
and the Quad Uncertainty Model (QUM), which provides the most amount of detail.
These models can be applied recursively, where the degree of certainty for each view
can itself be described by one of the abstract uncertainty models.
6.3.1 The Unified Uncertainty ModelThe advantage of the UUM is that it is simpler and suits the needs of many users.
The disadvantage is that there exists the potential for different uncertainty types to be
confused with one another (e.g. it may be hard to distinguish probabilities and fuzzy
values). This disadvantage is a concern when multiple different types of variables are
mixed in the same visualization.
We define a plural value, which is a modeling technique independent function
f : R→ [0,1] that maps real values to a degree of uncertainty. The visualization system
can focus on providing means for mapping a range of [0..1] to visual elements, rather
than providing methods specific to each type of modeling technique.
6.3 Uncertainty Abstraction Models 89
Our approach is inspired by the description of fuzzy sets, using the membership
function μ , as a generalized form of crisp sets (e.g. [77]). A fuzzy set is defined using
membership function μ: δ = μ(v) where δ is the level of membership ranging from 0
(definitely not a member) to 1 (definitely a member) and v is the candidate value. Thus
the candidate value 28 is half in the fuzzy set long if long(28) = 0.5. This method of
definition can be applied to crisp sets, for example the set A can be defined by:
δ =
{1 v ∈ A
0 otherwise
We expand this reasoning to other uncertainty modeling types. Thus the visualiza-
tion accessible form of information uncertainty is a function:
δ = f (v)
where δ is the degree of certainty ranging from 0 to 1, v is the candidate value, and f ()is the degree of certainty function. Traditional numbers can be considered to be a spe-
cial type of uncertainty modeling technique: the technique specifying total certainty.
The constant c is described by the following uncertainty function:
δ =
{1 v = 2
0 otherwise
δ = 0 indicates impossibility. The true meaning of δ varies with the type of uncertainty
being modeled. For non-specificity types, δ is either 0 (not possible) or 1 (possible).
For membership methods, δ ranges from 0 (definitely not a member) to 1 (definitely a
member). For probabilities and belief, δ ranges from 0 (impossible) to 1 (certainly).
Some visualizations are intended to compare the values of δ with each other. De-
pending on the uncertainty of the variable in question, the range of δ can vary. In these
circumstances it is desirable to normalize δ such that min(δ ) = 0 and max(δ ) = 1. For
example, a probability density function mapped to opacity will often be too transpar-
ent.
6.3.2 The Dual Uncertainty ModelThe DUM views uncertainty information from two different points of view: possibility,
where potential values are given by their possibility to represent the actual value; and
probability, which specifies the likelihood of alternative values or the belief in the
likelihood.
Figure 6.2 show the components of the uncertainty model. At the highest level,
90 Chapter 6. Uncertainty Abstraction for Visualization
there is a singular representative value that is considered to be a reasonable approxi-
mation. This value is used in instances where the visualization only wishes to present
a single choice. The representative value is most commonly one of the most possible
and most likely values. To conform to the principle of representation, the representa-
tive value should be a valid collapse for the uncertainty model. For both the possibility
and probability views, there may be multiple ranges of values. This is because not all
uncertainty models are convex. For example, a variable might only validly take on two
distinct values. In this case values in between these two are not valid. The degree of
possibility or probability of values can also vary across the alternatives. For example,
the probability of the mean will be higher than other values in a Gaussian probability
distribution.
singular view
representativevalue
is certain
possibility view
set of ranges
probability view
set of ranges
Figure 6.2: Schematic Illustration of the Dual Uncertainty Model
The notation used for DUM consists of the singular value, its Boolean certainty
flag, and a large left pointing angle bracket to the right of which there are two lines. The
top line lists the possibility function and the second line lists the probability function:
a = value
⟨possibility
probability
The degree of certainty specified by the range of possibility and probability functions
must be in [0,1]. The domain is the universe of discourse, which is typically the set of
real numbers.
Common instances of the dual uncertainty model for several different types of in-
formation uncertainty are described next using DUM notation.
The unknown value type is not typically used as a data structure. However, for
completeness we include its DUM representation:
unknownValue = NaN
⟨1
0
6.3 Uncertainty Abstraction Models 91
Where NaN stands for ’Not a Number’ and should result in an error when used. The
possibility is the full range of all possible values as any value is technically possible.
There is a theoretically even probability that it is any of an infinite number of values,
resulting in an infinite probability that it is not that value when one is chosen at random.
In other words, the probability tends to 0 as the collapse set tends to infinity.
The absolute certainty type is represented as follows:
absoluteCertainty = x
⟨μ(x) = 1, μ(x) = 0
P(x) = 1, P(x) = 0
where x is any value other than x. It is possible to identify absolute certainty because
the collapse set has a single value and both the possibility and probability of that value
are unit. The absolute certainty type is the only type legally able to return a value of
1 for probability. The absolute certainty type can be seen as a degenerate probability
where the probability is exactly 1.
Both the uninformed estimate and uncertainty ignorance types share the same
DUM profile:
unin f ormedEstimate = x
⟨1
0
where x is the estimated value of the variable. These reflect infinite potential to be any
value using the same mechanism as the unknown value type. There is infinite potential
since the value is uncertain, but there is no other description of the uncertainty space.
The classic set based constructions, such as the possibility set, interval, and set of
intervals types have a similar profile:
possibilitySet = x
⟨1 ∀{a : a ∈ s}, 0 otherwise
p ∀{a : a ∈ s}, 0 otherwise
where s is the set and p is a constant that represents the even probability over the
interval. In the case of a discrete possibility set, p = 1|s| . In the case of a set of intervals,
p is even over all intervals.
For the rough set and fuzzy set types there are grades of possibility and the possibil-
ity view is already explicitly defined by the data structure. However, the data structure
does not define the probability of any values and often an even probability is assigned
for all possible collapses:
f uzzySet = x
⟨μp ∀{a : μ(a) > 0}, 0 otherwise
where μ is the membership function.
92 Chapter 6. Uncertainty Abstraction for Visualization
The probability distribution types explicitly declare the probability, which implic-
itly provides possibility. Any non-zero probability is possible, even though it may be
highly improbable. This can be a problem because many commonly used distribu-
tions are supported on the whole number line. For example, the normal distribution
is asymptotic to zero as the potential value tends to either ∞ or −∞. Therefore imple-
mentations may wish to use a cut plane on the distribution function to eliminate highly
improbably values from consideration. Doing so simplifies the visualization. Thus, if
α is the height of the cut plane and P is the probability distribution function,
probabilityDistribution = x, f alse
⟨1 ∀{a : P(a) > α}, 0 otherwise
P
There is implicit conversion using the DUM model. Accessing the possibility at-
tributes, for example, performs an implicit conversion to a possibility modeling type.
6.3.3 Design and UseIn practice, the plural value is defined as a set of uncertainty ranges and provides an
iterator interface returning ranges. Ranges are either given as a sequence of convex
regions, in the case of a continuous function; or as a sequence of values, in the case of
a discrete function. Each range provides degree of uncertainty information. Figure 6.3
shows a UML diagram for the plural value and uncertainty range types along with
the DUM interface. To support the DUM, all uncertainty modeling technique objects
must implement the IDUMMethod interface. The UUM works similarly, except that it
returns just one plural value.
Figure 6.3: UML Diagram of the Dual Uncertainty Model
Discrete and continuous plural values require different visualization approaches.
For continuous plural values, the iterateByValue and iterateByUncertainty methods
6.3 Uncertainty Abstraction Models 93
sample along the particular dimension. Therefore the visualization should treat sub-
sequent values as connected. For discrete plural values, the iterateByValue will return
sequential values, which the visualization should treat as distinct. Algorithm 2 is a
generic visualization algorithm to handle plural values.
Algorithm 2 Plural Value Plotfor each r in range do
if r.type is DISCRETE thenfor each u in r.iterateByValue do
plot u distinctlyend for
elsefor every n u in r.iterateByValue do
plot u connectedlyend for
end ifend for
In the case of UUM, there will only be one way to visualize the information. How-
ever, in the case of DUM, the user will need to specify the view that they are interested
in when building the visualization. We now consider how the plural value is used to
extend three common visualization techniques: the cluster plot, the line graph, and the
parallel plot.
The cluster plot is an information visualization technique that plots multiple values
on the same axis. In the traditional cluster plot, each unit of data occupies a single
point. The uncertainty cluster plot instead samples the possible values over the axis,
usually using opacity to indicate the degree of certainty. This is achieved using Algo-
rithm 2. In the case of an interval, the degree of certainty will be one over the range,
resulting in a line for a 1-dimensional cluster plot.
A line graph is built from one or more series of values. It is a 2-dimensional graph:
the value plotted against one axis, and the position within the series against the other
axis. These points are then connected by a sequence of line segments. Extending the
line graph to show uncertainty requires that every value be a plot of the uncertainty
space, all of which is then connected to the next value plot in the series. In the case
of continuous plural values, this results in a polygon (for example, Figure 4.7). The
surface of the polygon can be textured by a sampling of the degrees of certainty over
the range using graphics hardware (for example, Figure 3.1 (d)).
Parallel plots plot multiple dimensions individually in sequence and use line seg-
ments to join points belonging to the same data unit together. There are two common
approaches to uncertainty in parallel plots. The first uses blurring or opacity to indi-
cate uncertainty [39], the second uses a third dimension [91]. The first technique can
94 Chapter 6. Uncertainty Abstraction for Visualization
be achieved in much the same way as the line graph given above. However, the second
method is slightly more complicated, as the sampling of the uncertainty now needs to
produce a geometric shape. Thus, instead of sampling the uncertainty space to deter-
mine the texture, the degree of certainty is instead used as a height value. A convex
hull is formed between the height values at this dimension and the next.
6.3.4 Alternative ModelsFor most users the UUM and DUM will provide sufficient granularity over the uncer-
tainty space for visualization. However, several further models can be used that expand
on these. There are two basic approaches to devising uncertainty abstraction models.
The first is to further differentiate the views of uncertainty, using the Quad Uncertainty
Model. The second approach is to apply recursion. We briefly discuss these options
below.
6.3.4.1 The Quad Uncertainty Model
The Quad Uncertainty Model (QUM) offers an even greater level of detail than either
the DUM or UUM. Both the possibilistic and probabilistic views can be further detailed
by separating necessity from possibility and belief from probability. This results in four
uncertainty degree functions: possibility, necessity, plausibility, and belief. Figure 6.4
shows this relationship between the variable, the possibilistic and probabilistic views,
and the four plural values.
variable
(possibility) (probability)
possibility necessity probability belief
Figure 6.4: Illustration of the Quad Uncertainty Model
The QUM may be appropriate for sophisticated uncertainty modeling users who
require the ability to distinguish these views. The mathematical relationship between
the the four views follow. For probability, bel(U) ≤ pl(U) and pl(U) = 1− bel(U)(e.g. Dempster-Shaefer, see [107]). For possibility, nec(U) ≤ pos(U) and nec(U) =1− pos(U).
The disadvantage of detailed uncertainty models like the QUM is that each addi-
tional view of uncertainty requires additonal visualization considerations, which places
a burden on the creators of visualization techniques. The strength of multiple views of
uncertainty is for visualizations that they can be used to show multiple views concur-
rently. For example, Lowe et al. [70] incorporated both belief and plausibility for time
series visualizations.
6.4 Case Study: User-Objectives in Financial Decision Support 95
6.3.4.2 Recursion
The models explored thus far (UUM, DUM, QUM) can be composed recursively. This
is similar to type-2 fuzzy sets [77], where the degree of certainty is itself subject to a
degree of certainty. Figure 6.5 illustrates this for a recursive UUM. For each possible
collapse on the real number line, there is a degree of certainty. Each degree of certainty
is itself only certain to a degree, as indicated by the height and color. Such recursion
can theoretically continue ad infinitum, although it quickly becomes impractical.
02
46
810
0
0.5
10
0.2
0.4
0.6
0.8
real number linedegree of certainty
degr
ee o
f cer
tain
ty o
f deg
ree
of c
erta
inty
Figure 6.5: Example of a Recursive UUM
This type of composition is helpful when dealing with uncertainty modeling tech-
niques that include this information. However, for many of the standard uncertainty
models this information is not available.
Recursive composition of abstract uncertainty models will likely find limited use,
since few real-world problems require this level of detail. However, for application
domains that do adopt this level of detail, the recusive composition offers the same
advantages for visualization as the non-recursive counterparts do for information un-
certainty visualization.
6.4 Case Study: User-Objectives in Financial Decision
SupportIn this case study, we construct visualizations according to user-objectives for a finan-
cial model incorporating uncertainty. We show how the visualizations can give the
investor a clearer understanding of the investment from various perspectives. The fi-
nancial model used is for an investor looking to buy a house and sell it again within the
96 Chapter 6. Uncertainty Abstraction for Visualization
next twenty years. The investor is hoping to earn rental from the property at all times
throughout possession and is aiming for a substantial appreciation in the value of the
property at the time of sale.
There are numerous variables to be considered by the investor both at the time of
purchase and throughout the life of the investment. These include the purchase price
and deposit amount; the loan and debt interest rate; the investors salary and rental re-
turn from the property; the subsequent effects of taxation (based on depreciation, other
deductions and rental income) and other miscellaneous expenses. There are uncer-
tainties associated with all these variables - both initially and accumulated over time.
Variables such as salary are known with certainty at the time of purchase but become
increasingly uncertain as time progresses.
The case study addresses the following questions:
• How will changes in interest rates affect profitability? (possibility objective)
• How will changes in house prices affect profitability? (possibility objective)
• How will changes in interest rates and house prices affect profitability? (possi-
bility objective)
• What is the most likely profitability resulting from changes in interest rates?
(reliability objective)
• What is the likely profitability resulting from changes in interest rates and house
prices? (reliability objective)
• What is the likelihood of making $100k by the 20th year? (structure discovery
objective)
• What effect do interest rate changes have on profitability over the next 5, 10, 15
and 20 year periods? (relevancy objective)
• In the expected future climate of changing interest rates and house prices, what
is the optimum time to sell the house in order to maximize profits? (relevancy
objective)
How will changes in interest rates affect profitability?
User Objective - Possibility This question requires a possibility objective visualiza-
tion as the user is interested in the possible changes in profitability that would result
from changes in the interest rate.
6.4 Case Study: User-Objectives in Financial Decision Support 97
Knowledge of Uncertainty - Non-specificity The knowledge of uncertainty required
is of the potential range over which interest rates might vary. No other likelihood or
probabilities need to be considered.
Display Technique - Possibility The user will wish to see a display of all outcomes,
where each outcome is equally visible. Thereby the scope of all possible outcomes is
clearly delineated.
Discussion The graph in Figure 6.6 shows the complete range of scenarios superim-
posed on the same axes. The top edge represents the optimistic case where interest
rates fall by 0.5% per annum, whereas the bottom edge represents rising interest rates.
As can be seen from the graph, the impact on profitability ranges substantially. This
graph shows the extent to which changes in the interest rate can affect the overall prof-
itability of the investment.
Figure 6.6: Possible effects of interest rate movements on NPV (2D).
How will changes in house prices affect profitability?
User Objective - Possibility This question requires a possibility objective visualiza-
tion as the user is interested in the possible changes in profitability that would result
from changes in the house price.
Knowledge of Uncertainty - Non-specificity The knowledge of uncertainty required
is of the potential range over which house prices might vary. No other likelihood or
probabilities need to be considered.
Display Technique - Possibility The user will wish to see a display of all outcomes,
where each outcome is equally visible. Thereby the scope of all possible outcomes is
clearly marked.
98 Chapter 6. Uncertainty Abstraction for Visualization
Discussion This question requires a possibility objective similar to Question 1. In
this instance, the model is run for a number of scenarios to produce a discrete set of
forty possibilities. Each scenario is superimposed upon the same axes and the results
are shown in Figure 6.7. Lines are colored by annual property price movements.
The graph shows that the investment is worthwhile for most positive average in-
creases in property value. Bands are marked using color to show bands of house
movement prices. It is important to note that this coloring does not relate to uncer-
tainty and is not connected to the possibility objective. Positive value changes are
colored green, neutral changes are colored gray, and negative changes are colored red.
The same information can be portrayed in three dimensions, as in Figure 6.8. The
negative NPV volume is shaded transparent blue to give a clear separation between
positive and negative NPV values.
How will changes in interest rates and house prices affect profitability?
User Objective - Possibility This question requires a possibility objective visualiza-
tion as the user is interested in the possible changes in profitability, only this time from
changes in two dimensions: the interest rate and house prices.
Knowledge of Uncertainty - Non-specificity The knowledge of uncertainty required
is of the potential range over which interest rates and house prices might might vary.
No other likelihood or probabilities need to be considered.
Display Technique - Possibility The user will wish to see a display of all outcomes,
where each outcome is equally visible. Thereby the scope of all possible outcomes is
clearly delineated.
Discussion To answer this question, we introduce the effect of changes in interest
rates. Figure 6.9 extends the previous visualization by adding a wire-frame outline
of a volume. The volume shows the possible NPV for interest rate changes of up to
±0.5% per annum change. The surface inside the frame shows the NPV for interest
rates remaining constant. Note that the scale of the NPV axis has been increased
from $±100k to $±200k. This volume provides the user with an indication of the
extent to which NPV can vary, but many of these possibilities have a low probability
of occurring. The surface inside the volume represents a scenario where interest rates
remain unchanged and is included to provide the user with a reference.
6.4 Case Study: User-Objectives in Financial Decision Support 99
Figure 6.7: Possible effects of house price movements on NPV (2D).
Figure 6.8: Possible effects of house price movements on NPV (3D).
Figure 6.9: Possible effects of house prices and interest rates on NPV.
100 Chapter 6. Uncertainty Abstraction for Visualization
What is the most likely profitability resulting from changes in interest rates?
User Objective - Reliability This question requires a reliability objective as the user
is interested in the most likely outcomes of NPV arising from changes in interest rates.
Knowledge of Uncertainty - Probability Since NPV is based on interest rates in
this model, this question requires the probability that interest rates change.
Display Technique - Reliability The user will wish to see a display of all outcomes,
where each outcome is visualized according to its associated probability. Thereby the
set of the most likely, or reliable outcomes are highlighted, with less likely outcomes
being less visually obvious.
Discussion The answer to this question requires a reliability visualization, since the
viewer is interested in the most likely outcome. The graph in Figure 6.10 shows
changes in interest rates, with more likely outcomes being more opaque. From this
graph the viewer can determine that the most likely outcome is positive, since most of
the visibly shaded area ends above the zero line.
Figure 6.10: Most likely profitability resulting from changes in interest rates
What is the likely profitability resulting from changes in interest rates and house
prices?
User Objective - Reliability This question requires a reliability objective visualiza-
tion as the user is interested in the certainty outcomes of NPV arising from changes in
interest rates and house prices.
Knowledge of Uncertainty - Probability The knowledge of uncertainty required is
of the probability over which interest rates and house prices can vary.
6.4 Case Study: User-Objectives in Financial Decision Support 101
Display Technique - Reliability The user will wish to see a display of all outcomes,
where each outcome is visualized according to its associated probability. Thereby the
set of the most likely outcomes are highlighted, with less likely outcomes being less
visually obvious.
Discussion The answer to the this question makes use of the probabilities in the data
model and is shown in Figure 6.11. Interest rates are expected to remain constant at
8% while the median long term house price rise is 3% per annum. We construct the
same volume that was outlined in figure 6.9, but map the alpha value to the normalized
probability of the event. The wire-frame outline has also been added to provide con-
text. In addition, the color is red when the NPV is negative and green otherwise, which
aids to distinguish positive NPV. We can determine from this figure that the most likely
outcome is positive, although there is a significant and persistent chance of a negative
result. To aid visibility, positive NPV is mapped to light green, while negative NPV
is strong red. These color choices were arbitrary and can be changed, for example, to
accommodate color-blind viewers.
The strong negative presence in the early years is due to upfront costs being in-
curred.
What is the likelihood of making $100k by the 20th year?
User Objective - Structure Discovery For this question, the user requires the effect
of an uncertain variable (interest rates) and its effect on the NPV to be made explicit.
Knowledge of Uncertainty - Probability The knowledge of uncertainty required is
of the probability over the range in which interest rates can vary.
Display Technique - Structure Discovery In this question, the viewer will wish to
see a display of all outcomes and some measure of their associated certainty. It is
appropriate in this case to include more information about the structure of the uncer-
tainty across the range of outcomes, because the viewer has explicitly asked for such
an understanding.
Discussion The answer to this question requires a structure discovery objective visu-
alization, since the viewer is interested in the degree of uncertainty. The graph shown
in Figure 6.12 maps the degree of uncertainty to color. By observing the color at the
crossing of the $100k and 20 year point, the viewer can determine that the likelihood
is small.
102 Chapter 6. Uncertainty Abstraction for Visualization
Figure 6.11: Volumetric representation of the most likely effect interest rate changeswill have on NPV.
Figure 6.12: Likelihood of NPV
6.4 Case Study: User-Objectives in Financial Decision Support 103
What effect do interest rate changes have on profitability over the next 5, 10, 15
and 20 year periods?
User Objective - Relevancy This question requires a relevancy objective as the user
is interested in limiting the information returned for changes in profitability from vary-
ing interest rates.
Knowledge of Uncertainty - Probability The knowledge of uncertainty required is
of the probability over the range in which interest rates can vary.
Display Technique - Relevancy The user will wish to see a display of only relevant
outcomes, where each outcome is equally visible. Thereby the scope of all possible
outcomes is reduced to only what is relevant.
Discussion This question is best addressed with a relevancy objective, as the user
is only interested in a limited amount of information that is not intuitively available.
Figure 6.13 shows a graph that summarizes the extent of the effect into five year incre-
ments. The criterion function assigns a relevancy based on the comparing extents for
neighboring years. The reduction process aggregates less relevant information such
that the maximal extents remain. The threshold is set to display four groups. An inter-
active system could allow the user to change the number of groups arbitrarily, or alter
the criterion function.
Figure 6.13: Effect of interest rate changes grouped into 5 year periods
In the expected future climate of changing interest rates and house prices, what
is the optimum time to sell the house in order to maximize profits?
User Objective - Relevancy This question requires a relevancy objective as the user
is interested in highlighting relevant times to sell. Furthermore, the user wishes to
associate the likelihoods of these scenarios.
104 Chapter 6. Uncertainty Abstraction for Visualization
Knowledge of Uncertainty - Probability The knowledge of uncertainty for this
query is of the probability over which interest rates and house prices can vary.
Display Technique - Relevancy The user will wish to see a display of only relevant
outcomes, i.e. those that highlight the optimal selling time.
Discussion This question suggests a relevancy objective since the user wishes to filter
the information by appropriate times to sell. Figure 6.14 indicates the optimum year to
sell by elevation, with immediate sale at the top and holding the property for the full
20 years at the bottom. The surface is colored according to the optimum selling time
to improve visibility, meaning that color matches elevation. The lower right corner
represents falling interest rates and high property value growth.
The yearly data is assigned a relevance according to the distance between NPV and
a target value. We chose +∞, which represents absolute profit. An alternative target
value could be based on yearly growth rate, such as 1% per annum.
Areas that indicate immediate sale (at time 0) are due to an unattractive investment
proposition (circled A). The plateau at year one represents scenarios where the upfront
costs have been partially recouped, but the investment remains poor over the long term
(circled B). The small plateau at five years is due to the tax effects of depreciation
deductions ending (circled C).
6.5 Case Study: Relevancy Objective in Business Pro-
cess ManagementBusiness Process Management is a field that encompasses management and informa-
tion technology. It includes methods, techniques and tools to design, enact, control,
and analyze operational business processes involving humans, organizations, appli-
cations, documents and other sources of information [121]. Central to this field are
the modeling languages that specify the processes, scheduling, interactions, and other
information. Graphical business process modeling languages are elegant solutions
because the user can visually interpret the process. There are many graphical busi-
ness process modelling techniques, such as YAWL [120]. For a detailed discussion
of graphical modeling languages see [119, pp. 3]. However, as the business process
grows in size, the graphical representation becomes difficult to deal with. This prob-
lem is well known to fields that use graphical languages (see e.g. [12]). While zooming
initially solves the issue of gaining an overview perspective, there is a finite limit to
the amount of zooming that can be performed before information becomes obscured.
For large models to be understood it is necessary that the level of controlled visual
6.5 Case Study: Relevancy Objective in Business Process Management 105
processing required is reduced. Controlled visual processing refers to those activi-
ties that require cognitive functions, such as the interpretation of text. The approach
explored here is to provide views of the specification that exclude less relevant infor-
mation. This filtering of information produces a model with lower complexity, but
thereby introduces a degree of uncertainty. This uncertainty reflects the lower resolu-
tion model’s potential for representing variations of the original model.
The aim is to construct a reduced representation for a given input specification.
Figure 6.15 shows the achitecture of the system. The original graph is filtered accord-
ing to a criterion function by a BPM reduction technique to produce a reduced graph,
which is presented to the user. The user then inspects the graph and alters the param-
eters, completing a feedback loop. Figure 6.16 shows a screen shot of the prototype
system, showing the graphical user interface controls available to the user.
We to build a reduced graph GR(VR,ER), where the R subscript denotes reduction.
The reduced graph is built such that it contains a subset of the nodes of the original
graph. In other words, a reduced graph GR(VR,ER) is built from an original graph
G(V,E), such that VR ⊂ V . A relevance factor, ε , is calculated by εi = C(vi) for each
node vi ∈ V , where C is the criterion function C : V → R and R is the set of real
numbers. C orders the nodes according to their relevance to the task of the user.
Preservation of the overall structure of the graph is achieved through identifying
important control flow nodes. The control perspective defines the flow of control
through the graph. A promising structural importance heuristic is based on the con-
nectedness, χ : V → Z, of the node and its estimated position in the routing hierarchy,
φ : V → Z. Z is the set of integers and φ is calculated by counting the number of splits
and subtracting the number of joins on the shortest path from s to the node, excluding
this node. χ is simply the sum of all connected nodes to this node. ε is calculated as
follows:
εi =χ(vi)
min(φ(vi),1)
Another approach is to rank nodes according to a text retrieval algorithm. To im-
prove context, we introduce a notion of relevance flow, which increases the relevance
of nearby nodes. The amount of the contribution drops off with distance traveled,
including loops. The amount of the drop off is arbitrary and a constant rate, β , gives
adequate results. This algorithm to assign relevance factors based on a text search term
for graph G(V,E) is given by Algorithm 3.
Once the degree of relevancy has been determined, the business process specifi-
cation needs to be reduced. The reduced graph must preserve the semantics of the
original graph to avoid being misleading. Semantics are preserved if all possible or-
ders of execution of the remaining nodes are unchanged from the original graph.
106 Chapter 6. Uncertainty Abstraction for Visualization
Figure 6.14: Optimum time to sell the property under different economic conditions.
OriginalGraph
1CriterionFunction
2BPM
Reduction
ReducedGraph
3Presentation
User
Figure 6.15: Architecture of the case study system
Figure 6.16: YAWL query: Prototype tool for the graphical business specificationreduction
6.5 Case Study: Relevancy Objective in Business Process Management 107
Algorithm 3 Text Retrieval Relevancy in Business Process Specifications* Find ST , the set of all nodes that contain the search term.
foreach v ∈ ST
* Initialize the contribution value, c← 1.
* Initialize the neighborhood node set, SN ←{v}.while c > 0 and SN = /0,
* update ε for all neighbors: ε ′(n)← ε(n)+ c for all n ∈ SN .
* reduce future contributions: c′ ← c−β .
* update neighbor list: SN ←{n ∈ SN : w ∈V,{nw} ∈ E}.endwhile
endfor
Two methods are described: the collapse method, which incrementally reduces the
graph until a threshold value for ε is reached, and the decimation method, which re-
moves all nodes below a threshold value and reconstructs the paths between remaining
nodes. The threshold value is assigned by the user and is called the alpha-cut value,
denoted α .
The principle behind the collapse technique is to incrementally reduce the graph
using the production rules shown in Figure 6.17. Each incremental change to the graph
is selected on the basis of removing the least relevant node (minimum ε) from the
current model GRn to produce the next GR
n+1. The following conditions must be
met to ensure well-formed results. A non-join node is selected for removal at each
increment. A split node is only selected if its predecessor is a task. The removal is
performed by merging the node with its predecessor. Split and join decorators are
removed from a node when a single inflow or outflow, respectively, results from the
collapse. Removing the split and join decorators yields a sequence operation.
One advantage of the collapse technique is that the order of collapses can be stored.
The inverse operation of a collapse, called a node-split, can then be performed to re-
store GRn+1 to GR
n. This is similar to progressive meshes [43] in computer graphics.
Another advantage is that since collapses relate one level of detail to another, the pre-
sentation can animate changes to increase interpretability of the technique. The cal-
culation of collapses can be performed in a pre-processing step and since the actual
collapse operation requires minimal processing, the visualization system can allow in-
teractive navigation between various levels of detail.
The decimation approach selects a number of nodes that will be included in GR. All
other nodes are removed. The original graph is then analyzed to reconstruct the paths
108 Chapter 6. Uncertainty Abstraction for Visualization
Pattern Original YAWL Reduced (introduce ε)
Sequence
Selection
Parallel
Multi-choice
Iteration
Figure 6.17: Production rules for reducing a YAWL graph
6.5 Case Study: Relevancy Objective in Business Process Management 109
between the remaining nodes. Nodes are selected for inclusion if their relevance is αor higher. A concurrent path is defined as any path from one node to another where a
split exists on the path that was not synchronized before reaching the destination node.
A direct path from x ∈ S to y ∈ S is a path from x to y without going through any other
element of S. The decimation-construction algorithm is given as follows:
Algorithm 4 Business Process Specification Decimation Algorithm* Initialize the set of included nodes, SI ←{s, t}* add all vi where C(vi) > α to SI .
* Initialize output edges, ER← /0
for x ∈ SI,y ∈ SI,x = y
* V ′R←VR∪{y}if there is a direct path from y to y
* E ′R← ER∪{yy}endif
if there is a direct path from x to y
* E ′R← ER∪{xy}endif
if a concurrent path {x..y} includes any z ∈ SI (z = x = y)
* add the offending split node(s) before x and y to SI
* add the matching join node(s) after x and y to SI
endif
endfor
A large business process is shown in Figure 6.18. This graph represents a simpli-
fied version of an actual business process used by an insurance company. Figure 6.19
shows the same graph after a significant collapse-based general simplification. This
reduced graph shows just 13% (6/44) of the nodes, but preserves important structural
features. Animating from the original to the reduced specification aided comprehen-
sion. Figure 6.20 shows a graph that was reduced according to relevancy to the text
“legal” and using the decimation technique. From this graph it is possible to observe
the relationship between relevant nodes.
110 Chapter 6. Uncertainty Abstraction for Visualization
Figure 6.18: Original graph prior to simplification
6.5 Case Study: Relevancy Objective in Business Process Management 111
Figure 6.19: Reduced specification using collapse approach (α = 2.5)
Figure 6.20: Reduced specification for text query “legal” using decimation approach(β = 0.5, α = 1)
112 Chapter 6. Uncertainty Abstraction for Visualization
CHAPTER 7
Integration of Core Features
7.1 IntroductionThe previous chapters have described the ingredients for managing uncertainty in mod-
eling and visualization. This chapter discusses the architecture that integrates these
ingredients into a functional whole. The result is a coherent and integrated platform
for information uncertainty modeling and visualization.
There are a number of questions that need to be addressed when designing an
architecture to integrate the ingredients. Which components are responsible for the
storing the uncertainty details and how can these be extended? How can users create
mappings between the abstraction models and visual elements? This chapter presents
a design and architecture that answers these questions.
We have developed a prototype system called IvySheet1 that implements the ar-
chitecture presented in this chapter. IvySheet is built using Sun Java Standard Edition
5.0 [1] and has been successfully tested on Microsoft Windows XP SP2, Ubuntu Linux
7.04, and Apple OSX 10.5 (Tiger). It was used for the case studies in Sections 4.6 and
6.4, and the studies in Chapter 9. An extension to IvySheet called GPGPUSheet was
also developed, which is described in Section 8.4. This extension demonstrates the
extensibility of the architecture described here.
This chapter is organized as follows. Section 7.2 discusses design considerations.
Section 7.3 then presents the architecture for an integrated system with illustrations
from the IvySheet prototype.
1The name IvySheet is derived from “Information Visualization Spreadsheet”
114 Chapter 7. Integration of Core Features
7.2 Design ConsiderationsThis section describes the design of the system. The aim of the design is to be ex-
tensible, flexible, and intuitive to use. Users are able to model and visualize their in-
formation within the same system and it automatically propagates uncertainty details
throughout the model, including to the visualizations. Uncertainty details are protected
against becoming separated, which reduces common user mistakes. Changing uncer-
tainty modeling techniques is a simple process, requiring only a change to a single
field.
The user interface is designed to be familiar to users of current commercial spread-
sheet systems. The design for the user interface is shown in Figure 7.1. The menu bar
provides access to various commands, such as renaming or saving the current spread-
sheet. The text interface area beneath the menu bar is used to enter and edit cell con-
tents. These are entered as text strings, which are then parsed and converted to a cell
of the appropriate type. A workbook contains several sheets and these are accessed
by tabs bearing their name. The scrollable sheet view allows users to interact with the
currently active sheet. When one of the sheet selection tabs is clicked, it brings the
appropriate sheet into view. The currently active cell is indicated by a cursor, which
users control using either the mouse or the keyboard.
File Sheet Cell Plugins
Tab 3Tab 2Tab 1
Menu barText interface area
Sheet selection
Scrollable sheet view
Currently active cell
Figure 7.1: Design of the User Interface
The sheet view consists of a matrix of cells. The cells are addressable using letters
(A..Z, then AA..ZZ, and so on) for the column and a number for the row. For example,
the top left cell is “A1” and the cell in the third column of the 50th row is “C50”. Each
cell can contain data, including any associated uncertainty information. The drawing
of a cell in this view is handled by the cell type object, which is also responsible for
storing the information.
Internally, the system consists of four core components and an unlimited number
of plug-in components. The main core component is the Kernel, which forms the inter-
face between all other components. This relationship is illustrated in Figure 7.2. The
other core components are the Language component, which is responsible for formula
processing; the Dependency component, which manages a graph of cell dependencies
7.2 Design Considerations 115
that arise as a result of formula use; and the Spreadsheet Datastructure, which holds the
cell objects in an addressable matrix data structure. The Language component includes
the uncertainty propagation models and uncertainty abstraction models.
Spreadsheet Datastructure
DependencyComponent
KernelLanguage
Component
Plugins
Figure 7.2: The Spreadsheet Architecture
There are three main tasks that the user performs:
1. Selecting the currently active cell
2. Altering the currently active cell, either by:
(a) Editing a textual description,
(b) Using a custom editing tool supplied by a plugin,
(c) Deleting the contents of the currently active cell, or
(d) Using Cut/Copy/Paste to remove/move/copy the currently active cell
3. Adding/Removing/Altering the sheets in the Spreadsheet Datastructure
When the contents of a cell changes there are two tasks that the system performs.
First, if it is a textual edit, then the new text is parsed by the cell types to build a new
cell object. Next, the dependency component is notified of the change, which causes
affected cells to be recalculated.
All of the core components are required for the system to function. In contrast,
plug-ins are optional and can be loaded at any time to build up the functionality of
the spreadsheet. Plug-ins are used to introduce new cell types, language functions,
uncertainty propagation and abstraction models, and visual elements. Additionally,
plug-ins can add items to the menu, referred to as user aids. For example, saving,
loading, cut, and paste operations are handled by user aids.
The advantage of this design is that it is flexible, extensible, and intuitive. The
visualization sheet provides a flexible mechanism for mapping information to visual
elements, enabling users to explore using all of the benefits that the formula language
116 Chapter 7. Integration of Core Features
provides. This approach keeps the interface consistent and brings the power of formu-
lae to the visual mapping process. The system is built around extensibility. All non-
essential functions are provided by plug-ins, including all of the features required to
provide new uncertainty modeling techniques. Finally, the operation of the spreadsheet
is designed to be consistent with the existing spreadsheet paradigm. This increases the
intuitiveness for new users as much of their existing knowledge can be applied.
7.3 ArchitectureThis section details the architecture of IvySheet, which follows the design given in the
previous section. The overall system can be divided into three main parts2: the core
components, the user interface, and the plugin components. The core components pro-
vide the essential spreadsheet infrastructure; the user interface provides an interface
for the user to interact with the system; and the plugin components provide function-
ality for cell data types, formula operations, etc. Figure 7.3 illustrates this high-level
Architecture using Unified Modeling Language (UML) notation [8].
user.interface
core
plugins
Figure 7.3: High-level Architecture
7.3.1 User InterfaceThe user interface consists of the main window and the scrollable spreadsheet view.
The main window contains the text field and sheet selectors. The scrollable spreadsheet
view is a sub-window of the main window and displays the ruler and grid of cells. It
consists of two classes, the display class and the controller class. The display class
is used to manage the window output, while the controller class is responsible for
responding to user input. This follows the Model-View-Controller architecture that is
used by Java’s Swing3.
2In keeping with Java naming conventions [1], the names of these components are in lower case.3Java Swing is the user interface library that IvySheet uses
7.3 Architecture 117
(a) View (b) Controller
Figure 7.4: View and Controller Classes for the Spreadsheet
Figure 7.4 show the UML diagrams for the view and controller classes that manage
the spreadsheet view. The UISpreadSheet class is only responsible for drawing the
grid and rulers. The actual cell contents are drawn by the cell type classes themselves,
which are added by plug-ins.
7.3.2 Core ComponentsThe core components consist of the kernel object and three main components: the data
model component; the dependency component; and the language component. The
kernel is responsible for loading and managing the plugin components.
Figure 7.5 shows a UML diagram of the core components. The datamodel com-
ponent holds the spreadsheet data structures. The information managed by this com-
ponent completely describes the current spreadsheet contents. The dependency com-
ponent manages the dependencies between cells, which are ordinarily created by for-
mulae. Cells are notified of changes to their dependencies, which gives them a chance
to update themselves. The dependency graph can be generated from the information
contained in the data model and therefore does not need to be stored to disk. The
language component manages the parsing and execution of formulae, look up of the
appropriate methods in the propagation models, and the cell addressing scheme. Each
of these components are described next.
7.3.2.1 The Kernel
There is only one Kernel object per running instance of the spreadsheet system. It
maintains a list of the available cell types and provides the interface for registering
118 Chapter 7. Integration of Core Features
core
datamodel dependency language
Kernel
Figure 7.5: The Core Components
new cell types. It is accessed through two different interfaces: the IKernel interface
provides the standard communications functions that most components use, while the
IKernelRegister interface is only used when registering new plugins. The IKernelReg-
ister extends the IKernel interface and therefore is a superset, as shown in Figure 7.6.
Figure 7.6: UML Inheritance Diagram for the Kernel Class
The IKernel interface is used by running components. The IKernelRegistration
interface is only used when registering new plugin components. It grants access to
methods that register new features of the system. Kernel is the actual kernel class. It
keeps a list of the novel cell types sorted in order of parsing priority. Cell types with a
higher parsing priority are checked first when parsing a string.
7.3 Architecture 119
7.3.2.2 The Datamodel Component
The Workbook holds the cells of the spreadsheet. This forms all of the data necessary
for persistent storage. Since the dependency graph can be imputed from the formulae,
it is not stored by the spreadsheet data-structure and is instead held by the dependency
component. The data structure is as follows:
Workbook = {SheetName→ Sheet}Sheet = X×Y →Cell
Cell = NIL |CellType
where CellType is the base class for all cell objects. The Sheets are sparse, meaning
that they can contain empty cells.
Figure 7.7 shows the main classes in the datamodel component. These are the
Workbook, which contains multiple SpreadSheets, which contain an array of CellCon-
tainers. CellContainers hold three optional units of information: the cell contents, the
overridden attributes, and a runtime reference to the dependency graph node. If the
contents is nil, then this cell is empty. However, it may still have formatting attributes
applied to it, such as borders or shading.
Figure 7.7: Main Classes in the Datamodel Component
7.3.2.3 The Dependency Component
Functional relationships between cells are created by formulae. When a cell is updated,
then all dependent cells must also be updated. The dependency component manages
these relationships using a two-way linked graph. The downstream links point to cells
120 Chapter 7. Integration of Core Features
that are dependent on this cell, while the upstream links point to cells that this cell
is dependent on. Thus, downstream links only exist for a cell if it is referenced in a
formula elsewhere and the upstream links only exist for a cell if it contains a formula.
Figure 7.8 illustrates the relationship between the dependency graph, the cells, and
the cell containers of the spreadsheet. The dependency graph nodes are depicted as
octagons and have three members: their upstream nodes, which are those that this node
depends on; their downstream nodes, which are the nodes that depend on this one; and
the listener, which is an object that is notified when dependencies change. The cell
container in the datamodel component provides links from cells to their dependency
graph nodes.
Dependency Graph
upstream
downstream
listener
cell containercell
contents
dependency
Figure 7.8: Relationship of the Dependency Graph to Cells and CellContainers
When a cell is changed in the data model, it notifies the dependency node. The
dependency node then uses an algorithm similar to mark and sweep [97] to notify all
of its downstream nodes. Mark and sweep is an algorithm used in garbage collectors
to deal with issues such as circular links. While circular references are disallowed in
IvySheet, a common pattern is to have diamond shaped dependencies, as illustrated
in Figure 7.9. Cell C is dependent on B1 and B2, both of which are dependent on
A. Under a naive implementation, a change to A would result in C being recalculated
twice. The mark and sweep approach avoids this by ensuring that nodes are only
recalculated once.
A
B1 B2
C
Figure 7.9: Cell C is Dependent on A Multiple Times
The mark and sweep algorithm operates as follows. All downstream nodes and
their dependencies are “marked”, by setting a special flag on the node. Then all nodes
are “swept” in a three step process:
1. If any upstream nodes are marked, then they are swept first
7.3 Architecture 121
2. The listener for the current node is notified and then the current node is unmarked
3. Sweep all marked downstream nodes.
As an example, consider a change to cell A from Figure 7.9. It is marked dirty and
then recursively marks its downstream neighbors. B1 is the first neighbor, which is
marked dirty and recurses down to C. C is marked but has no downstream neighbors
and returns back to B1. B1 has no other downstream neighbors and returns back to A.
A’s next downstream neighbor is B2, which is marked dirty. B2’s downstream neigh-
bor, C, is already marked and therefore not marked again. Since both B2 and A have
no other downstream neightbors, the marking process is complete and the sweeping
process begins. A has no upstream neighbors and is therefore recalculated and un-
marked. A’s downstream neighbor B1 is swept next. B1’s only upstream neighbor is
unmarked, thus B1 is recalculated and then unmarked. B1’s downstream neighbor C
is next. However, C cannot be recalculated yet as one of its upstream neighbors, B2,
is marked. Therefore, B2 is swept first. B2’s upstream neighbor, A, is unmarked and
so B2 can be recalculated and unmarked. It is now possible for C to be recalculated
and unmarked. C returns to B1, which returns to A. A’s next neighbor, B2, is already
unmarked and is therefore not swept. This simple example demonstrates that all nodes
are only recalculated once.
There is a single dependency graph object for the workbook. Figure 7.10 shows
the inheritance diagram for the DependencyGraph class. Other components inter-
act with the dependency graph through the IDependencyGraph interface. Formula-
calculated cells trigger the kernel to call addDependencies() to register their dependen-
cies. When a cell is deleted, clearCellDependency() is called. When a cell changes, no-
tifyCellChanged() ensures that dependent cells are updated. When loading a workbook
from disk, the notification system is turned off using setEnableFiring() for performance
reasons. Once the file has been loaded, the system is re-enabled and recalculateAll() is
used to ensure all cells are uptodate.
Figure 7.10: UML Inheritance Diagram for the DependencyGraph Class
122 Chapter 7. Integration of Core Features
7.3.2.4 The Language Component
The language component is responsible for formula execution. Formulae form the
logic of the spreadsheet by creating functional relationships between cells.
There are two types of operators: prefix operators and infix operators. Prefix oper-
ators are so named because the operator name comes first, for example. “add(1, 2)” is a
prefix operator. For infix operators, the operator appears in between the two operands,
for e.g. “1 + 2” is an infix operator.
When a formula is parsed, it returns a CodeTree. A CodeTree is an n-ary tree data
structure that can be walked to gain a list of the cell references. Algorithm 5 performs
a post-order traversal of the tree to collect cell references. These are entered into the
dependency graph. The CodeTree is a collection of CodeNodes, with one designated
as the root:
CodeTree = {CodeNode[], ∧root}CodeNode = Lea f Node |{ f unctionname, CodeNode[]}Lea f Node = f unctionname |constant |CellRe f erence
Algorithm 5 Collect-Cell-References( x )cellrefs[] = /0for all y in children do
cellrefs[] += Collect-Cell-References( y )end forif x is a CellRef or CellRange then
cellrefs[] += xend ifreturn cellrefs[]
A node either holds a function name, a constant value, or a cell reference. The tree
is evaluated from the bottom up, with child nodes forming the parameters to the func-
tion. Therefore nodes that contain constants or CellReferences are leaf nodes, which
do not have any children. If a leaf node contains a function, then that function has no
parameters, such as “getCurrentDate()”. The algorithm listed in Algorithm 6 is used
to evaluate a formula. It performs a post-order traversal of the tree to ensure the cor-
rect order of execution. Figure 7.11 shows an example of a CodeTree for the formula
“=5*A1+B7”. Rounded squares represent operations, circles represent constants, and
diamonds represent cell references.
A formula cannot produce a circular reference. In other words, it cannot depend
on a cell that depends on the result of this formula. This is because it would cause
7.3 Architecture 123
Algorithm 6 Evaluate-CodeTree( x )if x contains a constant then
return new cell from stringelse if x contains a cell reference then
return referenced cellelse
parameters[] = /0for all y in children do
parameters[] += Evaluate-CodeTree( y )end forreturn func( parameters[] )
end if
Add
Mul B7
A15
= 5 * A1 + B7= (5 * A1) + B7= Add( Mul(5, A1), B7)
Figure 7.11: Example Formula and its CodeTree
an infinite loop. Attempts to create a circular reference should be detected and treated
as an error. The formula language definition is given in Extended Backus-Naur Form
(EBNF) in Figure 7.12.
Formula = Element | AttributeElement = Const | CellRef | CellRange | OpAttribute = ( Element ’.’ attribute-name ) | ( Element ’(’ attribute-name ’)’ )Const = string | numberCellRef = [ sheet-name ’!’ ] column rowCellRange = CellRef ’:’ CellRefOp = PrefixOp | InfixOpInfixOp = Formula ( ’+’ | ’-’ | ’*’ | ’/’ ) FormulaPrefixOp = funcname ’(’ Formula ’,’ Formula ’)’
Figure 7.12: Formula Language Definition
Figure 7.13 shows the public methods for the CodeTree class. The constructor
takes a formula string as its parameter, which it parses to produce the tree nodes.
The evaluate() method results in a cell object and is called to evaluate the CodeTree.
getAsString() converts the CodeTree back into string form. This is used to provide an
editable string representation to the user. The getDependencies() method returns a list
of cells that this formula depends on, which is used to update the dependency graph.
124 Chapter 7. Integration of Core Features
Figure 7.13: UML Diagram for the CodeTree Class
The language component manages the uncertainty propagation model. Figure 7.14
shows the model classes. The PropagationModel class manages a list of propaga-
tion methods. The PropagationModelSet combines multiple PropagationModel ob-
jects and presents them as a single list. Figure 7.15 shows the IPropagationMethod
interface, which represents a propagation method. The sole purpose of a propagation
method is to take a list of parameter cells and return the resulting cell.
Figure 7.14: UML Inheritance Diagram for the Propagation Models
Figure 7.15: The IPropagationMethod Class
7.3.3 Plugin ComponentsThe plugin components are used to build the functionality of the spreadsheet system on
the foundation provided by the core components. They are responsible for adding cell
types, formula operations and propagation methods, visual elements, and menu items
(user aids). A plugin class is required to manage the loading process. It implements the
IPlugin interface and uses the methods provided by the IKernelRegistration interface
of the kernel object. Figure 7.16 shows the methods of the IPlugin interface. The
7.3 Architecture 125
register() method returns a short human readable name string and the getAboutString()
returns a detailed description.
Figure 7.16: The IPlugin Interface
One of the principal uses for Plugins is to add cell types to the system. Cell type
objects are responsible for managing the storage and display of cell contents. All cell
type classes must implement the ICellType interface, which is shown in Figure 7.17.
The kernel object invokes the isValidString() method whenever users enter or edit the
cell contents. If the method returns true, the buildFromString() method will be invoked.
Otherwise, the kernel will repeat this test with the other cell types. The getPriority()
method is used to determine the order in which isValidString() is called. A cell type
with a higher priority will be called first.
Figure 7.17: The ICellType Interface
The cell type objects are responsible for drawing the contents of their cell, which
is performed by the drawContents() method. The getEditableString() method returns
a string that users can edit. This string is displayed in the text edit field in IvySheet.
getTypeDescription() returns a human readable short name of the type of data that the
object handles. There are two interfaces that a cell type can implement to specify
special behavior. These are IIndirectCell and ISpecialEditCell.
IIndirectCell is an interface that is used by cell types that wish to return another cell
type when referenced in formulae. Formula cells are themselves an example of
an indirect cell, where the contents of the cell is a formula, but the referenced
value should be the result of the formula. There is a single method that returns
the indirect cell contents.
ISpecialEditCell can be used to provide a dialog for editing the cell contents. This
enables custom user interfaces to be developed to aid users with editing uncer-
tainty details. It consists of a single method that returns a boolean flag indicating
whether or not the contents were successfully changed.
126 Chapter 7. Integration of Core Features
The plugins also add uncertainty abstraction model handlers. Support for the DUM
was built into IvySheet, which is provided by the IDUMMethod interface shown in
Figure 7.18. Both the possibilitic and probabilistic views return an UncertaintyRange-
Set object, the details of which are shown in Figure 7.19. The UncertaintyRangeSet
is a set of UncertaintyRanges. An uncertainty range describes the uncertainty for a
range of potential collapses. This simplifies programming tasks for developers of vi-
sual elements, since they only need to consider the values that are in the uncertainty
range set. For convex data there will only be one uncertainty range in the set. Visual
elements can be designed to make use of the UncertaintyRangeSet, thereby making
them independent of the particular cell type that they are passed.
Figure 7.18: The IDUMMethod Interface
Figure 7.19: The UncertaintyRange and UncertaintyRangeSet Classes
The visual element objects represent nodes in a scene graph and implement the IVi-
sualElement interface shown in Figure 7.20. The build() method is invoked whenever
changes are made to the layout of the visualization sheet that affect this element. It is
passed an array of cells that are on the same row of the visualization sheet. These cells
typically include formulae that source information from elsewhere in the workbook
and transform it into an appropriate form. When any of the dependent cells change, the
visual element is notified through the inherited notifyDependenciesChanged() method.
7.4 Summary 127
The getTopNode() method returns the scene graph node that is managed by this visual
element.
Figure 7.20: UML Inheritance Diagram for the IVisualElement Interface
The final type of extension that the plugins can provide are user aids. These are
items that appear in the menus, such as saving the workbook to disk. The Action
interface is used to add menu items in Java and user aids must implement this interface.
7.4 SummaryThis chapter presented the design and architecture for a system for the modeling and
visualization of information uncertainty. This architecture integrates the ingredients
of Chapters 3-6 together into a coherent and extensible whole. The user experience
is designed to be familiar to spreadsheet users, which reduces training requirements.
Visualization is facilitated by a visualization sheet, which enables users to use formulae
to map information to visual element parameters. The essential infrastructure of the
spreadsheet system is provided by four core components: the kernel, the language
component, the dependency component, and the spreadsheet data structure component.
Usable functionality is provided by plug-ins, which are responsible for uncertainty
encapsulated cell contents, uncertainty abstraction, and visual elements.
128 Chapter 7. Integration of Core Features
CHAPTER 8
Advanced Features and Extensibility
8.1 IntroductionThe core features together provide a fully functional and integrated information uncer-
tainty modeling and visualization system. This chapter investigates advanced features
and extensibility. The advanced features enable users to work at a higher level and with
greater control over its operation. The extensibility enables the system to incorporate
new uncertainty modeling and visualization techniques.
One issue with spreadsheet systems is that large spreadsheets can become difficult
to understand. Problems can often be approached hierarchically by breaking large parts
into smaller sub-parts. To address this mode of operation we introduce the hierarchical
spreadsheet, which enables sheets to be embedded within other sheets, forming a tree
of sheets.
The visualization spreadsheet offers fine grained control over the structure of a
visualization. However, this approach differs from commercial spreadsheet systems,
which offer less flexible more targeted visualization techniques [26]. We offer embed-
ded visualizations as an approach that is similar to the present commercial offerings.
Embedded visualizations represent specific visualization techniques that are placed
within a cell. Floating observers are windows that display a cell in a floating window,
which enables them to remain on screen while users navigate around the spreadsheet.
To accomodate user- and domain-specific needs several parts of the framework
may require customization. For example, these customizations include the ability to
select alternative uncertainty propagation models. Furthermore, uncertainty modeling
and visualization techniques continue to appear. For this reason the system needs to
130 Chapter 8. Advanced Features and Extensibility
be extensible, enabling new data types, propagation methods, visual elements, and
purpose specific formula functions to be added.
This chapter is organized as follows. Section 8.2 describes the advanced features
of the system, including hierarchical spreadsheets, floating observers and embedded
visualizations, and customization options. Section 8.3 covers extensibility and lists the
steps needed to add a new information uncertainty modeling technique. Section 8.4
contains a small case study that illustrates how the extensibility of the system can be
exploited to provide a tool to aid GPGPU algorithm development.
8.2 Advanced FeaturesThis section describes three advanced features of the system in the following order.
First, the hierarchical spreadsheets feature, then the embeded visualizations and float-
ing observers, and finally, the customization capabilities are explored.
8.2.1 Hierarchical SpreadsheetsThe term hierarchical spreadsheet is used here to refer to spreadsheets arranged in a
tree structure. Such a structure can be useful, since lower-level spreadsheets can be
used to provide greater levels of detail, while higher-level concepts can be explored
in the higher-level sheets. Particularly, high-level planning can be conducted, leaving
details to be defined later in lower-level sheets. There are two major issues that need
to be solved in order to support a hierarchical structure. The first is how to build the
structure in the first place, and the second is how to pass information between parent
and child spreadsheets.
The first issue can easily be addressed by generalizing the encapsulation approach
by embedding a sheet within the cell of another sheet. Using this method, the hierarchi-
cal structure is implicit, because the parent spreadsheet contains the child spreadsheet.
Prior work, such as the ASP [95], mention the capability of embedding a spread-
sheet within a cell. However, they do not go on to solve the second issue of how to
exchange necessary information with the parent sheet. The principal problem with
their approach is that the cell contains a spreadsheet object but it is not specified how
to interface with such an object. Either a formula must include knowledge of the inter-
nal working of the spreadsheet, or the spreadsheet object must have generic methods
for interrogation, or both.
If the child spreadsheet is viewed as a function that returns a value, the problem
becomes simpler and more intuitive. It is simpler because we only need to provide
interrogation for the return value. It is more intuitive because this fits with the spread-
sheet paradigm: formulae are functions that return a single value. This capability can
8.2 Advanced Features 131
be provided by generalizing further components of the thesis: the representative value
of the spreadsheet is the return value. One embodiment of this approach is to make the
upper left cell (at address A1) the representative value.
Spreadsheets in general rely on a pull paradigm, where information is pulled in as
required1. This “pull” mechanism is achieved using cell references in a formula. Fur-
ther, commercial spreadsheet packages support the pulling of information from cells
in other sheets of the same workbook. For example, Microsoft Excel uses the excla-
mation mark to signify a sheet name prefix to a cell addresses [26]. However, this was
not designed for hierarchical spreadsheets and only allows for absolute addressing.
Computer file systems have long supported a hierarchical arrangement with both abso-
lute and relative addressing. By combining the address prefix scheme of commercial
spreadsheets with the relative addressing scheme of file systems, the child spreadsheet
can pull information from parent-, sibling-, and child spreadsheets in a flexible manner.
Thus far the editing of novel cell types is typically done through the text field. An
embedded spreadsheet can be defined by text; for example, as a map from addresses
to contents and attributes. However, it is not intuitive to edit this text based definition
as a means for updating the spreadsheet. Therefore, it is desirable to allow the user to
edit the spreadsheet using the spreadsheet view.
8.2.1.1 Prototype System
Figure 8.1 shows a prototype implementation of the hierarchical spreadsheet extension.
A new cell type is added, called the SpreadsheetCell. Adding a SpreadsheetCell creates
a new spreadsheet in the workbook. The unique name of the embedded sheet uses the
parent spreadsheet name as a prefix with an exclamation mark as the separator. This
strategy can be employed recursively using the exclamation marks to separate each
sheet name. The traditional approach allows users to select any sheet from a linear list
of sheets. A more intuitive means for selecting sheets in a hierarchical structure is to
use a tree control.
Table 8.1 compares the SpreadsheetCell to an IntervalCell. Both cell types store
more than a single value: The IntervalCell encapsulates the middle number and the
margin of error; while the SpreadsheetCell links to another sheet in the workbook. The
representative value for the SpreadsheetCell is its upper left hand cell. It is possible
for that cell to itself contain a SpreadsheetCell, allowing arbitrary depth. The editing
interface for the interval is to use the text edit field at the top of the window. Such a
text interface is unintuitive for editing an entire spreadsheet. Therefore, when the user
wishes to edit the contents, the embedded spreadsheet is displayed. To insert a sheet at
the current cell, the user can type “Spreadsheet(name)” into the text edit field, where
1Some researchers have experimented with push operations in spreadsheets, e.g. [65]
132 Chapter 8. Advanced Features and Extensibility
Figure 8.1: Hierarchical Spreadsheet Prototype. The Parent Sheet (Left) Contains theChild Sheet (Right)
name is the uniquely named suffix of the new sheet.
Category IntervalCell SpreadsheetCell
Information stored Middle number, errorbounds
Reference to the childspreadsheet
Representative value Middle number Representative value of thecell at A1 of the child sheet
Editing interface Text edit field Jump to the spreadsheet
Table 8.1: Comparison of IntervalCell and SpreadsheetCell
The formula language is also updated to allow relative addressing. Table 8.2 il-
lustrates four examples of cell addresses and their interpretation within the addressing
scheme. The first two are absolute addresses and can be found in most commercial
spreadsheet programs. The second two are relative addresses, incorporating concepts
from file paths.
example description
B7 Cell in column B, row 7, of the current sheetJune!B7 Cell in column B, row 7, of the sheet named June.!June!B7 Cell in column B, row 7, of the child sheet named June of the
current sheet!..!June!B7 Cell in column B, row 7, of the sheet named June that is a child
of this sheet’s parent (i.e. a sibling sheet)
Table 8.2: Examples from the Prototype Addressing Scheme
This addressing scheme can be used to provide named fields. Named fields are
cells that are referenced using a human readable name rather than their row and column
address. Embedding a spreadsheet with a specific name and then referring to the A1
cell of that sheet using relative addressing achieves the equivalent of a named field.
This is made more intuitive if the language component automatically appends A1 to
8.2 Advanced Features 133
relative addresses that end with an exclamation mark. An example is a spreadsheet
that contains an embedded sheet named “Tax”. Cells within the main sheet can refer
to the return value of the child sheet using “!Tax!”. An example formula could read
“= B7 * !Tax!”, indicating that the result should be the result of the cell in B7 multiplied
by the named field “Tax”.
8.2.1.2 Advantages
The addition of hierarchical spreadsheets feature makes it possible to work at multiple
levels. For example, high level planning can occur in the main spreadsheet.
Planning and Problem Solving Using a hierarchical spreadsheet system allows the
user to delay the modeling of details. A place holder value can be used during the
planning phase. The user can then return to the place holder later and replace it with
an embedded spreadsheet that contains the details. This process can be applied re-
cursively, progressively refining details of the user’s data model. The detail sheets can
incorporate sub-calculations and metadata. For example, they may contain annotations
describing rationale, measurement procedures, or other relevant information.
More Complex Problems As the size of a spreadsheet grows, the user typically sep-
arates related modular parts into individual sheets. This temporarily helps to manage
the complexity, however, the list of sheets is linear and it is up to the user to maintain a
mental map of the structure. At greater levels of complexity, the mental effort required
by the user to navigate and understand the spreadsheet becomes increasingly difficult.
The hierarchical spreadsheet structure more naturally represents typical decomposition
of problems.
Summarization and Abstraction There are many problems for which spreadsheets
are used that naturally lend themselves to a hierarchical structure. For example, chrono-
logical problems are usually broken down into hierarchical time units. Years can be
broken down into months, which break down to weeks, and so on.
8.2.2 Floating Observers and Embedded VisualizationsA floating observer is a free floating online view of a cell. Free floating means that it is
placed within its own window, such that it can be resized and freely placed anywhere
on the display. The online property refers to the immediate update of the observer
whenever the contents of the observed cell changes.
Floating observers enable an alternative to the visualization sheet approach. Rather
than drive the visualization through a visualization sheet, the visualization system is
134 Chapter 8. Advanced Features and Extensibility
implemented through novel cell types. This enables specific purpose visualization
techniques, such as line graphs or pie charts, to be defined using the same mechanism
developed for information uncertainty cell types. The display canvas of the visualiza-
tion system is the cell area, thus embedding visualizations within the spreadsheet. This
is similar to Information Visualization Spreadsheets [46]. Users can configure a float-
ing observer so that the visualization continues to be visible after they navigate away
from the sheet that holds the visualization cell.
Figure 8.2 shows a floating observer that is connected to a cell containing an uncer-
tainty line graph. The uncertainty line graph celltype uses the UUM. The uncertainty
space for each variable in the data set is sampled and shaded polygons are connected
between variables. The data set is specified by the fourth parameter as a range of cells.
In this case, the data set consists of intervals, producing an opaquely shaded poly-
gon. As an added feature, the uncertainty line graph celltype detects when the cell is
too small for the labels to fit and hides them. This enables many such graphs to be
readable when packaged into small areas, allowing “small multiples” [118] style tiled
visualization.
Figure 8.2: Floating Observer, Observing the Uncertainty Line Graph in Cell D14
The floating observer is a non-addressable cell that exists within the dependency
8.2 Advanced Features 135
tree, although it must be a leaf node. The dependency tree is used to ensure that it
is updated whenever the cell contents changes. Figure 8.3 shows the dependency tree
for the floating observer of Figure 8.2: the observer is external to the spreadsheet data
structure, but is dependent on the cell in the spreadsheet.
"Salary & Tax" Spreadsheet
Floating Observer
D14
D4
D5
D10
…
Figure 8.3: Dependency Tree for the Floating Observer from Figure 8.2
Commercial spreadsheet systems (e.g. Microsoft Excel [26]) currently use graph-
ical user interfaces for managing visualizations. We use a special edit mode that is
provided by the visualization celltypes to present equivalent graphical dialogs for their
configuration. Commercial spreadsheets also present wizard interfaces for inserting
new visualizations and the equivalent can be achieved here using user aids.
The advantage of the floating observer and embedded visualization approach is that
it offers a visualization interface that is more consistent with existing commercial ap-
plications. However, unlike popular commercial offerings, the visualization is still held
within a cell and is therefore addressable. When several embedded visualizations are
placed in cells next to one another, they form a tiled visualization. The disadvantage
of the embedded visualization approach is that it is not as flexible as the visualization
sheet. Flexibility can be an important requirement for information uncertainty visu-
alization. As the information uncertainty visualization techniques mature, we would
expect that popular uncertainty visualization methods will be distilled into celltypes
for use in embedded visualization.
8.2.3 CustomizationDifferent users have different requirements for their uncertainty modeling and visual-
ization. For example, in some large datasets it may be desireable to simplify uncer-
tainty propagation at the expense of precision, while some mathematical models may
require more precise measures. In addition, some application domains may require
different treatment of uncertainty to others. For these reasons it is important that the
system can be customized to meet the needs of users.
Our system can be customized in three ways. Firstly, the uncertainty modeling data
types that are available can be customized. New information uncertainty models can
136 Chapter 8. Advanced Features and Extensibility
be added and existing types can be excluded. Secondly, the propagation models can
be changed. This enables users to implement or select propagation models that are
appropriate to their task. Thirdly, different uncertainty abstraction model handlers can
be chosen. For example, one handler might treat intervals as a uniform distribution
when viewed from a probabilistic point of view while another might use a normal
distribution.
8.2.3.1 Configuring the Uncertainty Data Types
Some users and application domains may wish to use a particular set of data types and
exclude others. It is therefore desireable to enable users to alter the list of active cell
types that the running system maintains.
Figure 8.4 shows the dialog that enables users to edit the list of cell types directly.
New cell types can be added to the list by loading the appropriate plugin using the
button provided. Cell types are given an opportunity to handle the string contents of a
cell based on their order in the list and the top list item has the highest priority. Buttons
are provided to rearrange the order of the list.
Figure 8.4: CellType List Editor
Any alteration to the list of active cell types requires that the system adapt the cur-
rent spreadsheet contents to reflect this change. For example, if a spreadsheet currently
contains an interval “10 +- 5” and the IntervalCell type becomes disabled, then that cell
needs to be converted into another cell type. This should be done automatically by the
software.
A trivial implementation is to re-evaluate all strings in the spreadsheet. However,
the cells that need to be re-evaluated can be determined from the changes that are made
to the list. If a cell type was disabled, then all cells of that type must be re-evaluated.
If a cell type was enabled, then all cells whose type is lower in the list must be re-
evaluated. If there was a change in the order of cells, then only those cells whose type
is between the top- and bottom-most change need to be re-evaluated.
8.2 Advanced Features 137
8.2.3.2 Configuring the Propagation Model
The propagation model defines a set of functions that handle operations between val-
ues. The formula language component parses formulae and invokes the appropriate
propagation method according to the propagation model. The invoked method is re-
sponsible for combining the input parameters and returning a result.
There are multiple alternative methods for combining uncertainty information. There-
fore, the user requires the flexibility to control which methods are active from a se-
lection of alternatives. The methods that can be chosen are made available through
plugins. This allows new, esoteric, and domain specific functions to be developed for
the system.
There are two mechanisms that enable users to select propagation methods. The
first allows users to individually manage the method mappings in the propagation
model. A dialog that facilitates this mode of editing in our prototype is illustrated
in Figure 8.5. However, the number of propagation methods tends to be large and this
technique can become cumbersome. The second mechanism uses a layer of abstrac-
tion to group similar propagation methods into a propagation model and group several
propagation models into the currently active propagation model set. The propagation
model set is an ordered set of propagation models, where the order governs the priority
of the propagation model. The higher positioned propagation models override their
lower positioned siblings.
Figure 8.5: Propagation Model Editor
Internally, a propagation model set provides the same signature to method mapping
as a propagation model and fulfills the same contractual obligations from a program-
ming point of view. The UML diagram shown in Figure 8.6 illustrates this relationship,
where the IPropagationModel contract is fulfilled by the PropagationModelSet class.
There is one globally active propagation model set, which holds the currently active
propagation models. The Plug-ins register propagation models with the kernel during
initialization. Users can edit the globally active propagation model set using a dialog as
shown in Figure 8.7. This allows the user to enable, disable, and reorder the available
138 Chapter 8. Advanced Features and Extensibility
Figure 8.6: Propagation Model, Method, and Model Set
8.3 Extensibility 139
propagation models. The “Edit” button edits the currently highlighted propagation
model using the Propagation Model Editor shown in Figure 8.5.
Figure 8.7: Propagation Model Set Editor
An advantage of the propagation model set approach is that the user can rapidly
switch between different groups of propagation methods. Users can also disable an
entire propagation model, rather than each method individually.
8.2.3.3 Configuring the Abstract Uncertainty Model Handlers
Abstract uncertainty model handlers implement the interface between the visual map-
ping system and the information uncertainty modeling techniques. These handlers
perform implicit conversion of uncertainty information to fulfill their contract. For
example, when using DUM, a handler will be responsible for providing a possibilis-
tic view of a Gaussian distribution. To accomodate variations in user and application
needs, these handlers need to be changeable.
Figure 8.8 shows a dialog that allows users to change the DUM handler for each
cell type. The left-hand column lists all of the cell types that are currently loaded by
the spreadsheet system. Users can select the currently active DUM handlers in the
right-hand column, but only those cell types that implement the DUM interface are
editable. The area on the right provides a description of the currently selected DUM
handler.
The advantage of selecting abstract uncertainty model handlers is that users can
choose the handler that is appropriate to their task. The handlers are changed at runtime
with immediate effects, enabling users to compare the difference in real-time.
8.3 ExtensibilityThere already exists many ways to represent information uncertainty. However, as
frameworks such as Klir’s generalized information theory mature, more modeling tech-
niques will continue to be appear. For this reason the requirement for extensibility was
140 Chapter 8. Advanced Features and Extensibility
Figure 8.8: Dual Uncertainty Model Selector
anticipated in our design. This section examines the extensibility capabilities, first
from the point of view of information uncertainty, and then from a general purpose
point of view.
The design of our system is based around a plug-in architecture. There are six ways
in which a plug-in can extend the functionality of the system:
1. New cell types can be added. These are responsible for parsing strings and hold-
ing the contents of cells. Adding CellTypes increases the types of data that can
be stored in a cell.
2. New functions can be added to the formula language. This enables the plug-ins
to define purpose specific functions, such as statistical methods.
3. New propagation methods and models can be added to provide for different
mathematical rule sets.
4. New uncertainty abstraction model handlers can be added, offering users greater
choice of abstraction methods.
5. New visual elements can be added. Visual elements are displayed on the visual-
ization canvas and are used in the visualization sheet.
6. New user aids can be created, which are packages of code that are executed
from the menu. These are the most free-form additions and their uses range from
offering macro like behavior to the implementation of complete sub-modules.
The extensibility capabilities are exposed to the plug-in at load time through the IKer-
nelRegistration interface. The registration functions are called by the plug-in to notify
the system of the new features it supports. Some features can be registered in a disabled
state, which users can manually enable using the customization dialogs. The scope of
8.3 Extensibility 141
our prototype was limited to only one uncertainty abstraction model (the DUM). How-
ever, this is not a fixed limitation and other implementations may support as many
abstaction models as are appropriate.
We now consider how support for a new uncertainty modeling technique can be
added to system. The following steps would be taken by the plug-in:
1. Register a new CellType, which is responsible for storing the uncertainty details
in the spreadsheet data model.
2. Add a propagation model to facilitate correct propagation of the uncertainty de-
tails. The propagation model contains a propagation method for each of the op-
erators (e.g. addition, subtraction, etc.). The propagation methods take objects
of the new CellType as parameters and return a new cell as the result.
3. Register appropriate uncertainty abstraction model handlers for the new Cell-
Type.
Users can now enter data using the new uncertainty model. The new cell type will parse
the string and store the information. Any formulae involving these cells will use the
new propagation methods. Visualizations that are built will extract their information
through the newly added uncertainty abstraction model handler. Furthermore, existing
formulae and visualizations will also be compatible, enabling users to convert existing
variables to the new cell type.
Separate to uncertainty modeling techniques, new visual elements can also be reg-
istered. Users access visual elements through the visualization sheet by mapping infor-
mation to them. The ability to add new visual elements increases the sophistication of
the visualization system. For example, it is possible to create a suite of visual elements
that interface to Visualization ToolKit2 objects.
Embedded visualization types are added to the system as new CellTypes. For ex-
ample, a ScatterPlotCell can implement the scatter plot [40] visualization technique.
An example string format for this cell is “ScatterPlot(cellrange)”, where the cell range
defines the cells that are included in the plot. For embedded visualizations there is no
need to add any corresponding propagation or abstraction models.
The methods for extension described so far are not limited to dealing with infor-
mation uncertainty. Only uncertainty abstraction model handlers are specific to in-
formation uncertainty. New cell types can conceivably manage any type of data for
which propagation methods can be used to offer convenient operations. Functions in
the formula language can also be general purpose.
2The Visualization ToolKit (VTK) is a freely available visualization system (see [105]).
142 Chapter 8. Advanced Features and Extensibility
8.4 Case Study: GPGPUSheetThis chapter has discussed several advanced features, which are explored in the case
study of the next chapter. The case study presented here investigates the extensibil-
ity of our system to more general purpose problems. A field of research that has
recently developed is General Purpose Graphics Processing Unit (GPGPU) program-
ming. GPGPU exploits the advances in processing power available from the parallel
stream processing architecture found in graphics processors. Examples of use range
from image processing through to fluid flow simulation. Further information is avail-
able from the GPGPU website3, including tutorial sessions that have run at ACM SIG-
GRAPH 2004, IEEE Visualization 2004, and Supercomputing 2006.
Here we present GPGPUSheet, which is a prototype system for creating and visu-
alizing GPGPU applications. GPGPU uses graphics card textures for memory storage
and fragment shader programs to apply convolution kernels. For this purpose we in-
troduce two new cell types: the TextureCell and the KernelCell (see Table 8.3). The
TextureCell wraps an image that can be loaded onto the graphics card as a texture. The
KernelCell is a graphics card fragment shader program, in human-readable code form.
Name Description
TextureCell 1D, 2D, or 3D image data with 1 (Luminance), 3(RGB), or 4 (RGBA) components
KernelCell Fragment shader program
Table 8.3: Novel Cell Types for GPGPU
The spreadsheet approach to GPGPU facilitates comparison tasks, such as effect
choice for parameters or inspection of intermediate states during repeated convolu-
tions. For example, the spreadsheet shown in Figure 8.9 compares results of a GPU
based edge finding algorithm under different parameters. The parameters “threshold”
and “mode” are stored in rows 3 and 4. Row 5 contains the results from applying the
kernel in cell B2 to the texture in cell B1 for each parameter. The convolution kernel
in cell B2 is displayed in the “Cg Code Editor” window.
A convolution kernel is executed by using a function call RunKernel in a formula.
The output is itself a TextureCell, thus allowing the output from one kernel to be the
input for another. The new functions that GPGPUSheet introduces are listed in Ta-
ble 8.4. The parameters of the RunKernel function consist of a comma separated list
of program parameters. The Image functions construct a texture from the numeric
values contained in a cell range. The Pixel functions extract a numeric value from a
particular pixel within a texture. PadTex creates a texture of a particular dimension
and fills it with a cell. If that cell happens to be a texture, then it is tiled. ResampleTex
3The GPGPU website, http://www.gpgpu.org/
8.4 Case Study: GPGPUSheet 143
creates a scaled copy of another texture.
Function Example of Use
RunKernel RunKernel(B1, “threshold”=B3, “mode”=B4)ImageL,ImageRGB,ImageRGBA
ImageRGB(A1:F7)
PixelR,PixelG,PixelB,PixelA
PixelR(B2, 10, 10)
PadTex PadTex(B2, 512, 512)ResampleTex ResampleTex(B2, 256, 256)
Table 8.4: Novel Functions for GPGPU
Textures are arrays of pixel data. The arrays can be one-, two-, and three-dimensional
and each pixel can contain either one (luminance), three (RGB), or four (RGBA) com-
ponents. The components can either be integer or floating point values. In some in-
stances computer graphics hardware can be limited to texture sizes that are a power
of two and the user will need to either pad or resample their data. The PadTex and
ResampleTex functions can be used to overcome this limitation. Textures can either
be built from ranges of cells in the spreadsheet, or loaded from an external file through
the menu option.
The KernelCell implements the ISpecialEditCell interface, allowing it to present
a complete syntax highlighting editor (as shown in Figure 8.9). A menu option to
insert base code for well known kernels was added to provide a spring off point for
developing new convolution kernels.
In conclusion, this case study shows the extensibility of our spreadsheet-based de-
sign to other problems. GPGPUSheet is valuable tool for assisting research into GPU
based algorithms, such as our work in [14]. It enables the user to visually explore the
effects of an algorithm, to concatenate complex sequences of kernels, and to evaluate
changes to shader programs in real-time. Shader programs and textures were added as
novel cell types, and the formula language was extended to enable GPU bound opera-
tions to be carried out.
144 Chapter 8. Advanced Features and Extensibility
Figure 8.9: Prototype system for GPGPU visualization
CHAPTER 9
Evaluation
9.1 IntroductionThis chapter provides an analysis of the performance of the system and compares it
to common alternatives. When a user wishes to model information uncertainty there
are two broad methods: the first is to use a numerical approach, which simulates the
uncertainty; the second is to use an analytical approach, where the parameters of the
uncertainty space are managed using mathematical principles. The objective for either
method is for the user to gain an understanding of the uncertainty space affecting their
model.
Numerical approaches typically come in three flavors: manual perturbations of
input variables, such as “what-if” exploration; automated regular perturbations, such
as animated stepping of variables; and random perturbations of input variables using
Monte-Carlo simulation. The results from manual perturbations are typically observed
by the user to mentally map the uncertainty space. Advancements in computing power
has made automated perturbation viable and this technique is now in common use
within certain sectors, including the financial markets sector.
Analytical techniques are available using several software packages. Commercial-
grade spreadsheet systems, such as Microsoft Excel and OpenOffice.org calc, include
statistical operators that enable modeling using normal distributions. Mathematical
packages, such as MatLab, tend to offer a selection of functions for dealing with a
wider range of distributions, but their use is essentially the same. The parameters
needed to model a normal distribution are the mean μ the standard deviation σ . These
parameters are then passed to statistical functions for interpretation (e.g. see Table 9.1,
146 Chapter 9. Evaluation
drawn from [78]).
Function Description
STDEV(range_of_values) Calculates the standard deviation σ from a range ofvalues
NORMDIST(x, μ , σ ,FALSE)
Calculates the probability of a value ≤ x will occur
NORMDIST(x, μ , σ , TRUE) Calculates the probability that x will occurNORMINV(p, μ , σ ) Returns the value with probability p. This is the
inverse of NORMDIST
Table 9.1: A Selection of Normal Probability Functions in Microsoft Excel 2003
The system designed in the course of this thesis takes an analytical approach to
information uncertainty and is most similar to using a spreadsheet system. In Sec-
tion 4.6 we examined the advantages of our approach over that of using a traditional
spreadsheet. In Section 6.4 we investigated how user-objectives can be used to drive
visualization with a case study in financial decision support. In this chapter we eval-
uate the complete system in three ways: firstly, we perform a quantitative analysis,
comparing our system to traditional numerical and analytical methods; secondly, we
describe user feedback that was gained from two surveys; and thirdly, we investigate
advantages of using our system in a business planning case study.
9.2 Quantitative AnalysisThis section examines the performance issues and the potential for user errors. Re-
search by Brown and Gould [9] has found that errors are common in spreadsheets,
despite a high level of confidence by their creator. Furthermore, the majority of errors
they observed (65%) were due to mistakes in formulae. Reinhardt and Pillay [100]
analyzed errors made by computer literacy students in spreadsheet formulae and found
that 82% made errors in formulae that involved cell adressing. This percentage rose
to 93% for formulae involving financial functions. These observations suggest that
reducing the number and complexity of formulae should form a pricipal strategy in
substatially reducing the number of errors made.
Another approach to reducing spreadsheet errors is to work in groups. However,
Panko and Halverson [88] found that half of the spreadsheets generated by groups of
four still contained errors. This is an improvement over spreadsheets devloped by in-
dividuals (81% down to 50%), but contrary to expectation, group collaboration did not
eliminate the “oversight errors” [88, pp. 6]. Furthermore, some errors were contentious
and although one member of the group recognized the problem, they were unable to
convince the other members of the group. This suggests that, practicality aside, soley
increasing the number of collaborators will not continue to improve the error rate.
9.2 Quantitative Analysis 147
In this section we measure the number of operations that the user is required to
perform under several common conditions. The operations are categorized by type
and summarized. The materials used in these experiments are:
• 1 PC
– OS: Microsoft Windows XP SP2 (32bit)
– Java Runtime: Sun Microsystems JRE 1.6.0_01
– CPU: AMD Athlon 64 X2 Dual Core 2Ghz
– RAM: 1GB
– Display: 1680x1050, 32bit color
• OpenOffice.org Calc 2.2.1, a freely available commercial grade spreadsheet
• IvySheet as at March 2007, our prototype system
We consider two modes of address. The first, construction, is where the spreadsheet is
constructed with an anticipation of the information uncertainty modeling requirements.
The second, retrospection, is where the spreadsheet is already built, but at least one
variable needs to be changed to use a different modeling technique.
There are a number of actions that the user can perform, which are categorized in
Table 9.2. Block pasting is an operation where a smaller copy region is pasted into
a larger area. This is typically used to replicate formulae from a template, such as
from a prototype cell to the remainder of a column. Modern spreadsheet software
automatically adjusts cell references in pasted formulae. We count a block paste as a
single copy+paste operation.
Label Description
α Enter data into a cellβ Modify data in a cellγ Change layout (e.g. insert a column)δ Enter a formulaε Modify a formulaζ Copy+paste operationη Delete (clear) a cell
Table 9.2: Actions of the User
The data entry (α), related updates (β ), and cell removal (η) operations are least
likely to be the source for errors. This is because there are a limited number of things
that can go wrong: either the data is incorrect or the wrong cell was chosen. These three
operations also provide direct visual feedback, where the cell contents directly reflects
148 Chapter 9. Evaluation
the new input. Layout changes (γ) are more likely to cause errors, albeit indirectly.
Not only can the layout change be wrong, but it may also contribute to subsequent
mistakes made due to the changed frame of reference. Formulae (δ and ε) provide
the most opportunity for error. They offer indirect feedback, where the formula is
revealed only on request. This makes it harder to recognize a problem. Similarly,
copy+paste operations (ζ ) will update any copied formulae automatically. Incorrect
formula updates may be hard to spot and often require manual formula inspection.
We do not weigh formula complexity in this evaluation. Our system has compara-
tively simpler formulae than the alternatives that we tested. This is due to automated
propagation. For example, the multiplication of two normal distributions using tradi-
tional means requires the formulae:
var(XY ) = var(X)∗ var(Y)+ X ∗ X ∗ var(Y )+ Y ∗ Y ∗ var(X)
μ(XY ) = μ(X)∗μ(Y )
Which is a total of six multiplications and two additions over two formulae. The
same operation is effected by
v′ = v1 ∗ v2
using our system, which is a single multiply in a single formula. While excluding for-
mula complexity biases results against our system, there are two reasons for doing this.
Firstly, it is possible that prudent users are able to define macros for certain commonly
used formulae. This would make the arithmetic complexity measure meaningless. Sec-
ondly, it is easy to introduce bias in the opposite direction. The formulae are mostly
of the same form and therefore it is possible that users will introduce fewer errors than
the complexity might otherwise imply.
9.2.1 Construction ExperimentsFor the initial experiment a spreadsheet with three inputs is used. The first variable is a
salary increase, the second is the tax rate, and the third is the initial salary. The output
is the net income, calculated over a number of years: netincomei = salaryi−taxi where
salaryi = salaryi−1 ∗ rategrowth and taxi = salaryi ∗ ratetax.
Figure 9.1 shows the spreadsheet without uncertainty information, henceforth re-
ferred to as the reference spreadsheet. In this instance, rategrowth is 7%, ratetax is 17%,
and salary1 is 60000. The construction cost is as follows:
• Text labels consume 6α .
9.2 Quantitative Analysis 149
Figure 9.1: Spreadsheet for First Experiment, Without Uncertainty
• Year labels consume 1α +1δ +1ζ , since they are regular.
• Initial variables require 3α .
• Formula for tax and netincome, for year 1 take 2δ .
• Formula for salary in year 2 takes 1δ .
• formulae for salary in years 3-7 takes 1ζ .
• formulae for tax and net income in years 2-7 takes 1ζ .
Total cost is 10α +4δ +3ζ .
9.2.1.1 Test I
Test I-A The first test is to use a normal distribution to model the salary growth
rate: μ = 0.07, and σ = 0.01. Figure 9.2 shows the spreadsheet constructed using our
system. The construction cost is the same as without uncertainty, 10α +4δ +3ζ . The
user simply enters “[email protected]” in the salary growth field.
Figure 9.2: Spreadsheet for First Experiment, With Uncertainty
150 Chapter 9. Evaluation
Test I-B Using a traditional analytical approach is shown in Figure 9.3. Two columns
are needed for every variable that depends on the salary growth, one to hold the mean
and another to hold the standard deviation. The construction cost is as follows:
Figure 9.3: Spreadsheet for First Experiment, Analytical Approach Using TraditionalMethods
• Text labels consume 10α .
• Year labels consume 1α +1δ +1ζ , since they are regular.
• Initial variables require 5α , including the initial standard deviation of 0 for salary
in year 1.
• Formula for tax and netincome, for year 1 take 4δ , although the formulae are
more complicated.
• Formula for salary in year 2 takes 2δ .
• formulae for salary in years 3-7 takes 1ζ .
• formulae for tax and net income in years 2-7 takes 1ζ .
Total cost is 16α +7δ +3ζ , an increase of 6α +3δ .
Test I-C The Monte-Carlo method [3] is a numerical approach and takes a different
form. A prototype row is built that contains the entire model in a single row. The salary
rate will now be generated from a normal distribution, using the pseudo-random num-
ber generator. This is achieved using the following formula: “=NORMINV(RAND();μ;σ )”.
The prototype row is then duplicated to rows below an arbitrary number of times,
where each row represents a trial run in the simulation. Finally, the mean and standard
deviation for each of the output variables is calculated. Figure 9.4 shows part of the
spreadsheet. The mean is calculated using the formula “=AVERAGE(Ω1:Ωn)” and
standard deviation using “=STDEV(Ω2:Ω(n− 1))”, where Ω represents the column
and n represents the number of trial runs. The cost is as follows:
9.2 Quantitative Analysis 151
Figure 9.4: Monte-Carlo Spreadsheet for First Experiment
• Text labels for all columns (23α) 3+2+6*3
• First are three columns for the input variables, tax rate, salary rate, and initial
salary (2α +1δ ).
• Following this are the calculated columns for tax and net income for year 1 (2δ ).
• Following this are calculated columns for salary, tax, and net income for years 2
through 7 (1δ +2ζ ).
• The prototype row is copied to rows below (1ζ ).
• Calculate mean and standard deviation for salary, tax, and net income and repeat
for all 7 years (6δ +1ζ )
Total cost is 25α +10δ +4ζ , an increase of 15α +6δ +1ζ over our system.
9.2.1.2 Test II
For the second experiment, we consider the case where all three input variables are
uncertain: tax rate, salary rate, and initial salary. Tax rate will now be modeled using
a normal distribution with μ = 17 and σ = 3, the salary rate remains at μ = 0.07 and
σ = 0.01, and initial salary will now have a normal distribution of μ = 60000 and
σ = 1000.
Test II-A Using our system the construction cost is the same, as the user simply
enters “17@3” for the tax rate field and “60000@1000” for the initial salary. Total
cost: 10α +4δ +3ζ .
Test II-B Using an analytical approach in a traditional spreadsheet requires two fields
where previously there was one: one to hold the mean and one for standard deviation
for tax rate. This adds 2α , one for the text label and one for the additional field. The
standard deviation fields are already required to store the additional information in-
troduced by uncertainty in salary. However, the tax is now more complex as it must
combine the uncertain salary with the uncertain tax rate. In this case it is the multi-
plication of two normal distributions, the formula for which was described previously.
152 Chapter 9. Evaluation
The remaining cost calculation is the same as in Test I-B. Total cost: 18α +7δ +3ζ ,
an increase of 8α +3δ over the reference sheet.
Test II-C Expanding the Monte-Carlo spreadsheet consists of using the pseudo-
random number generation for three fields. Thus the cost is 3δ for the first three
columns. Total cost: 23α +12δ +4ζ , an increase of 13α +8δ +1ζ over the reference
sheet.
9.2.2 Retrospection ExperimentsIn these experiments we evaluate the cost of changing uncertainty modeling techniques
in existing spreadsheets.
9.2.2.1 Test III
Our first test is to retrospectively upgrade the salary rate from a normal quantity to a
normal distribution. For this scenario the change was not anticipated by the user and
therefore the starting point is the reference spreadsheet, as shown in 9.1. Thus, the
resulting spreadsheets from Test III should be the same as those in Test I.
Test III-A Using our system, the user simply modifies the salary growth field. Total
cost: 1β .
Test III-B Using the analytical approach in a traditional spreadsheet requires a change
in layout:
• First, the columns for standard deviation are added (3γ) and labeled (3α).
• The text labels for salary, tax, and net are updated to reflect that they are now
means (3β ).
• Then the text label for salary growth is changed to “salary growth mean” (1β ).
• If the new mean were different to the current value of the salary growth it would
need to be updated.
• The standard deviation text label is added and the standard deviation is entered
(2α).
• Zeros are added for the standard deviation columns in year 1 (3α).
• Formulae are entered for the standard deviation columns in year 2 (3δ )
9.2 Quantitative Analysis 153
• The row for year 2 is replicated for years 3-7, which is a short-cut to fill in
missing formulae (1ζ ).
Total cost: 8α +4β +3γ +3δ +1ζ (3+2+3,3+1,3,3„1).
Test III-C Changing to a Monte-Carlo spreadsheet requires significant work as it
completely changes the spreadsheet layout. Typically this involves starting a new
spreadsheet:
• A new sheet is added to the system (1γ)
• Copy+paste the first two column headings, enter the remaining labels (21α +2ζ )
• Copy+paste the tax rate and initial salary (2ζ )
• Enter the formula to generate the salary growth rate from the pseudo-random
number generator (1δ )
• The calculated columns for tax and net income for year 1 are entered (2δ ).
• The calculated columns for salary, tax, and net income are replicated for years 2
through 7 (1δ +2ζ ).
• The prototype row is copied to rows below (1ζ ).
• Calculate mean and standard deviation for salary, tax, and net income and repeat
for all 7 years (6δ +1ζ )
Total cost: 21α + 1γ + 10δ + 8ζ . This cost is virtually equivalent to the cost for
construction, 23α +12δ +4ζ .
9.2.2.2 Test IV
For the next set of experiments we consider the reverse, removing the uncertainty infor-
mation from salary rate. The resulting spreadsheet should be the reference spreadsheet
as shown in 9.1
Test IV-A Using our system, the salary rate is changed back to “7%”. Total cost: 1β .
154 Chapter 9. Evaluation
Test IV-B Using the analytical approach in a traditional spreadsheet again requires a
change in layout:
• First, the columns for standard deviation are removed (3γ).
• The text labels for salary, tax, and net are updated to reflect that they are no
longer means (3β ).
• Then the text label for salary growth is changed to “salary growth” (1β ).
• The standard deviation field for salary growth is deleted (1η). (the text label was
removed when the column it was in was deleted)
Total cost: 4β +3γ +1η .
Test IV-C Assuming that the original sheet was kept when the new sheet was added
for the simulation, then the simulation sheet would simply need to be removed. Total
cost: 1γ .
9.2.2.3 Test V
For the next set of tests the spreadsheet already contains the salary growth as a nor-
mal distribution and the tax rate is promoted to a normal distribution. The starting
conditions for Test V are the spreadsheets constructed in Test I.
Test V-A Using our system the tax rate field is changed. Total cost: 1β .
Test V-B Using the analytical approach in a traditional spreadsheet requires some
changes:
• The text label for the tax rate is changed to reflect that it is now a mean (1β )
• A standard deviation field is added and labeled for the tax rate (2α)
• The standard deviation calculation for tax in year 2 is updated (1ε)
• The standard deviation calculation is propagated to the other rows (1ζ )
Total cost: 2α +1β +1ε +1ζ .
Test V-C The field for tax rate is calculated using the pseudo-random number gener-
ator (1δ ) and this change is propagated to the other rows using copy+paste. Total cost:
1δ +1ζ .
9.2 Quantitative Analysis 155
9.2.3 DiscussionFigure 9.5 shows the construction costs graphically. The main type of operation carried
out during construction was data entry, α , followed by formula construction, δ . The
construction cost of our system is eqivalent to that of the reference spreadsheet. There
were only five more keystrokes in Test I-A and seven more keystrokes in Test II-A
compared to the construction of the reference spreadsheet. Significantly, the spread-
sheet layout and all formulae are identical to the reference spreadsheet.
Figure 9.5: Construction Cost Graph
Both analytical probability and monte-carlo spreadsheets required a change in lay-
out and an increase in formula counts. Figure 9.6 compares the number of formulae
required. Although there are more formulae in the monte-carlo spreadsheets, they are
more consistent than their counterparts in the analytical spreadsheet. Combinations
of multiple uncertain variables quickly results in complicated formulae in analytical
sheets.
Figure 9.6: Number of Formulae in Construction Experiments
156 Chapter 9. Evaluation
Table 9.3 lists the operations required for the retrospection experiments. In all
three experiments our system required only a single data field to be changed. Both
of the other methods required changes to either fomulae or layout (see Figure 9.7),
irrespective of whether uncertainty information was being added (Tests 3 and 5) or
removed (Test 4).
α β γ δ ε ζ η3-A 13-B 8 4 3 3 13-C 21 1 10 84-A 14-B 4 3 14-C 15-A 15-B 2 1 1 15-C 1 1
Table 9.3: Retrospection Cost
Figure 9.7: Formula and Layout Changes During Retrospection Experiments
The results from this section are signficant for three reasons. Firstly, fewer op-
erations are needed when using our system, which means faster construction times
and fewer chances for accidental errors. Secondly, our system requires no formulae
or layout changes. Prior research has shown that formulae are the principal source of
spreadsheet errors. However, layout changes are also significant because they require
the user to alter their mental map of the spreadsheet. This potentially leads to com-
prehention issues and increases the likelihood of incorrect cell references. Thirdly, it
made little difference in our system whether information uncertainty had been antici-
pated or not. In contrast, traditional spreadsheet methods had significant formula and
layout costs for introducing and removing information uncertainty.
9.3 Sensitivity Analysis Surveys 157
9.3 Sensitivity Analysis SurveysTwo studies were conducted using a sensitivity analysis scenario. The first sought sub-
jective feedback from participants drawn from a variety of backgrounds. The second
study was designed to gain a more detailed evaluation from financial experts.
9.3.1 First SurveyIn the first survey, a sensitivity analysis scenario was sent to participants, whose back-
grounds are listed in Table 9.4. The survey is attached in Appendix A.
3 financial analysis experts3 frequent spreadsheet users1 mathematics expert4 skilled computer users (who did not frequently use spreadsheets)
Table 9.4: Respondents to the Survey
The examples were produced using IvySheet as at Feb 2007. The graphs are un-
certainty line graphs that use the UUM, as described in Section 6.3.3. They were em-
bedded within the spreadsheet and displayed using floating observers. The questions
and selected repsonses follow.
1. Did the interval approach make sense to you? This question was answered
positively by all respondents. However, one of the finance experts commented that
intervals can be achieved
“using a sensitivity matrix ie variables across one axis and results across
the other. Often times this can be more useful because you can see how
sensitive the result is to the change in variable.”
They later noted that a probability distribution would address this situation. Addition-
ally, they went on to say that
“I can see that presenting in this new way is powerful when it comes to
planning because you can show the range of probable outcomes as op-
posed to a whole series of data points.”
which indicates that the polygonal visualization was useful. One of the financial ex-
perts described the system as “very intuitive”.
158 Chapter 9. Evaluation
2. Would you consider this a more useful tool than a standard spreadsheet? This
question was answered positively with one exception: one of the financial expert re-
spondents was undecided. They described their position as,
“Not sure that I’d say its more useful than a standard spreadsheet but it
presents the data very clearly which could be useful for presentations.”
The other financial expert commented that
“This tool will save much time ... It requires a single edit instead of itera-
tive input changes and structure changes to store the results. Even MatLab
requires more effort when investigating variables that are uncertain.”
3. Would you like to use a tool like this? This question was answered positively
with one exception: the infrequent spreadsheet user indicated that they would not use
this system. Their reason was:
“Not a visualizing person.”
The mathematics expert answered positively and gave the following reason:
“Because it gives an immediate visual feedback on the uncertainty.”
4. What would you change? Many of the respondents shared the view that prob-
ability distributions would be particularly useful. Notably, all financial experts made
this clear, for example:
“I would like to apply probability distributions & see the results.”
The mathematics expert expressed a desire to
“specify a probability distribution over the interval (or the real line) and
then visualize this ... by colour/shading intensity. [T]here could be a stan-
dard set of distributions to choose from e.g. Gaussian, binomial, poisson,
etc.”
Two of the frequent spreadsheet users also suggested an interactive brushing on the
graph, so as to determine which variances contributed to that point.
9.3 Sensitivity Analysis Surveys 159
5. Do you have any other comments about what you have seen? Responses to
this question primarily expanded on answers to question 4 above. One of the skilled
computer users asked the following question:
“Can I manipulate the data graphically (dragging the line/s in the graph)
instead of typing in variances via the spreadsheet?”
This again implies that interactivity in the visualization is worthy of future work. The
mathematics expert warned against using the term variance to describe intervals.
9.3.2 Second SurveyFor the second study, three financial experts were asked to complete tasks using the
software and answer questions on a Likert scale [22]. The particpants were selected
for their familiarity with sensitivity analysis techniques.
The materials used for the experiment were:
• 1 PC
• OS: Microsoft Windows XP SP2 (32bit)
– Java Runtime: Sun Microsystems JRE 1.6.0_01
– CPU: AMD Athlon 64 X2 Dual Core 2Ghz
– RAM: 1GB
– Display: 1680x1050, 32bit color
• IvySheet as at March 2007, our prototype system
• Entrance Questionnaire
• Description of Tasks
• Exit Questionnaire
• Observations Sheet
The method used in the experiment is as follows.
1. Participants were first given an entrance questionaire to determine their back-
ground experience.
2. They were then seated before a new instance of the IvySheet software and asked
to perform three main tasks. During this time they were observed and their
performance was recorded.
160 Chapter 9. Evaluation
3. Upon completion of the tasks, the participants were asked to complete an exit
questionnaire.
The questionnaires, tasks, observations sheet, and results are attached in Appendix B.
Three financial experts volunteered to take part in the study. One of the participants
(RS) had also taken part in the first survey, the other two had not. The entrance ques-
tionnaire asked the participants to rate their experience in four areas on a four-point
scale (1-None, 2-Beginner, 3-Intermediate, 4-Advanced). Figure 9.8 shows the aver-
age rating. Volunteers considered themselves proficient with spreadsheets and finan-
cial techniques, but rated their familiarity with visualization tools between none and
beginner level. There was a reasonable degree of confidence with uncertainty methods.
These results suggest that the participants are well placed to evaluate comprehension
of the uncertainty propagation and would have little trouble using the spreadsheet soft-
ware.
Ave ra g e P ro fi le o f R e s p o n d e n ts
1 .0 0 2 .0 0 3 .0 0 4 .0 0
S p re a d s h e e ts
Vis u a liza tio n
U n ce rta in ty
Fin a n cia ls
Figure 9.8: Background of Respondents
The times taken to conduct each task were measured to the nearest minute. This
includes the time taken to read the instructions. The timer was stopped when a par-
ticipant signalled that they had completed the step and were confident of their results.
Figure 9.9 shows the average time to complete each step. The first step involves build-
ing a spreadsheet for the scenario. It was the most time consuming step and participants
spent approximately one third of their time checking their work. The second step was
to create a graph and this was quickest of the steps. The only significant amount of
time was spent by one participant, who then requested assistance. This was because
they were unsure whether to label the axes. The majority of time in the third step was
spent by participants checking their results.
Participants were asked to ensure that their work was correct before signalling that
they had completed a step. Correctness was then verified by the observer, who recorded
9.3 Sensitivity Analysis Surveys 161
Ave ra g e Tim e to C o m p le te Ta s ks (m in u te s )
0 1 2 3 4 5 6 7 8 9 1 0
S te p 3
S te p 2
S te p 1
Figure 9.9: Average Completion Time
the number of errors made. Any errors were corrected by the observer before proceed-
ing to the next step. During the course of performing a step the participants would spot
and correct errors in their own work. These events were recorded as corrections and
not as errors. Only a single uncorrected error was found, which was due to an oversight
error while composing a formula in the first step. This type of oversight error is com-
mon in spreadsheet tasks [88, 87] and was not related to uncertainty comprehension.
The lack of errors, especially in the second and third steps, indicate that participants
were comfortable and proficient with the software and tasks.
Upon completion, the participants were asked to complete an exit questionnaire
that used a five-point Likert scale (1-Strongly Disagree, 2-Disagree, 3-Neither Agree
nor Disagree, 4-Agree, 5-Strongly Agree). Figure 9.10 shows the questions and av-
erage responses. There was general congurence in the responses, with an average
standard deviation of only 0.431.
These results show that the participants found the encapsulation of uncertainty de-
tails to be intuitive and aided comprehension (Q1-Q4). They did not have issues with
the additional information (Q4, Q7, Q8). The real-time graph was marginally useful,
but would likely find greater use in more complex scenarios (Q5, Q6). The participants
wanted to do more uncertainty modeling than they currently are, and they expressed
a desire to use a tool like IvySheet (Q9-Q13). In the comments section, one of the
participants noted that they primarily use Monte-Carlo techniques for dealing with un-
certainty.
1Detailed results are listed in Appendix B
162 Chapter 9. Evaluation
1. Overall, I found the system to be intuitive2. Placing an interval in a cell made sense to me3. I would need more training to complete the tasks4. I found it easy to follow the effect that an interval had on the rest of the spreadsheet5. The graph helped me to understand the effects better6. I would definitely use the graph if there were many numbers7. The spreadsheet became too cluttered or hard to read8. This tool makes it easier to make mistakes9. I would use a tool like this if it were available to me10. I would use intervals in Microsoft Excel if they were supported11. My current tools are adequate for modeling and visualizing uncertainty12. I would use normal probability distributions more often in my work if I had a tool like this13. I would model uncertainty more often if I had these features available
Average Responses
1.00
2.00
3.00
4.00
5.00
1 2 3 4 5 6 7 8 9 10 11 12 13
Figure 9.10: Questions and Average Responses
9.4 Case Study: Business PlanningVirtually every commercial enterprise will engage in business planning. Business plans
are used to explore future directions, align strategic objectives, and support applica-
tions for financing. Good business plans are not only intricate, but also need to clearly
communicate their contents. The nature of a business plan is to explore the unknowns
of the future and how these will be dealt with [7].
This case study explores the application of our information uncertainty modeling
and visualization framework to business planning. Business plans can be divided into
three main parts: the strategic plan, which includes goals and management; the mar-
keting plan, which includes market research and analysis; and the operational plan,
which includes operations and finance.
Information uncertainty is most evident in the Marketing plan. The Marketing plan
involves predicting market trends, potential sales, and similarly uncertain information.
These uncertainties flow on to the Operational and Strategic plans, which specifically
outline how these challenges will be met. The Operational plan explores survivability,
profitability, exit points, and financing needs based on differing market conditions. The
Strategic plan incorporates managements issues, such as staffing and resources, which
can include information uncertainty.
In this case study we consider the business plan for XYZ Software2, a newly
formed spin-off company from an existing publishing company, ABC Publishing. ABC
2These names are fictional
9.4 Case Study: Business Planning 163
Publishing has a long history of publishing scientific books and journals, and therefore
they have access to market information for print-based publications. However, XYZ
Software plans to work in a new space: the digital publication field, which is growing
rapidly and for which the data is not yet stable. The business plan is being prepared to
raise funding for the new venture, which will be run as an entirely separate company
to ABC Publishing.
A spreadsheet was built to provide the supporting information for the business plan.
We now analyze the advantages of our system for this purpose.
Hierarchical Structure Business plans tend to include large spreadsheets with many
calculations. To alleviate complexity, we use hierarchical spreadsheets to build a struc-
ture that groups information into useful categories. Figure 9.11 shows the hierarchical
structure for XYZ’s business plan. This makes access to relevant information more in-
tuitive than selecting the sheet from a linear list. Furthermore, the sheet names tend to
be longer when arranged in a linear list as they do not benefit from the implied labeling
given by hierarchical composition. For example, the Year 1 sheet under Balance Sheet
would otherwise likely be given the name “Balance Sheet Y1”.
Figure 9.11: Hierarchical Business Plan Spreadsheet
Incremental Refinement The spreadsheet can be built top-down, since details could
be left to subsheets. For example, in Figure 9.12, the cell in B1 was originally a high
164 Chapter 9. Evaluation
level estimated value. As the work on the business plan progressed, this cell was
replaced by an embedded spreadsheet, the contents of which is shown in Figure 9.13.
Figure 9.12: Market Share Overview with Embedded Sheets
The uncertainty details can similarly be refined as information becomes available.
The predicted values for market share can initially be modeled using singularly-valued
estimates, then promoted to normal distributions as more detailed predictions are made.
Compensate for Missing Details The first step in the marketing plan is to establish
the target market. A survey of ABC Publishing customers who would use the new
XYZ Software service was broken down by age. Complete information on their classi-
fication into customer types was unavailable and consequently this was estimated using
intervals. These intervals compensate for the fact that the information was not known.
The target market is shown in Figure 9.14.
Summary Observation Sometimes it is convenient to keep an eye on a high level
summary value while building lower level contributions. Floating observers can ob-
serve any cell and therefore a floating observer can be created for the high level sum-
mary value. Traditional spreadsheets support a “split view”, where the window is
divided into viewports. However, this feature extends to the current sheet only and is
of no use when the summary value is on another page. The floating observer is created
in its own window and therefore is unaffected by which spreadsheet is selected.
Adding a New Uncertainty Modeling Technique For certain circumstances it is
necessary that a value cannot dip below zero, such as percentage values. It was still
desirable to model these as normal distributions, but clamped to only positive values.
One solution is to add a new modeling technique, called a clamped normal, for which
any value outside a range is clamped.
9.4 Case Study: Business Planning 165
Figure 9.13: Market Share for Year 1
Figure 9.14: Target Market
166 Chapter 9. Evaluation
Simplified Propagation A promoter is defined for promoting a normal distribution
to a clamped normal with the range [−∞,+∞]. A propagation method for arithmetic
between clamped normals is added to the propagation model. All other combinations
are handled using hierarchical heterogeneous propagation. Thus, arithmetic between a
quantity and a clamped normal results in the quantity x being promoted to a clamped
normal p = (μ = x, σ = 0, [−∞,+∞]) before the operation is carried out.
Probabilistic Break-even The most critical part of a business is the break-even anal-
ysis. Break-even analyses are carried out before any venture is embarked upon, as
they are an instrumental indicator of the potential success. Using traditional methods,
the break-even analysis is performed under several scenarios, which is a numerical
approach to information uncertainty. The break-even analysis shows the cumulative
profitability over the projected period (e.g. Figure 9.15). The point at which the prof-
itability becomes non-negative is the break-even point, in this case the second quarter
of Year 3. Using our system we have been able to model most variables using normal
distributions. The interactions between these numerous normally distributed indepen-
dent variables culminates in a normally distributed profit prediction. This allows us to
plot the probability of profitability, as shown in Figure 9.16. As can be seen from the
graph, we are likely to become profitable in the third year, but even after 3 years there
is still a chance that the venture will not yet be profitable.
Figure 9.15: Typical Break-even Visualization
In summary, IvySheet provided several benefits over using a traditional spreadsheet
for business planning. The naturally hierarchical structure of business planning could
be transferred to the spreadsheet. Planning could proceed before all details were avail-
able and the details could be added as they became available. Uncertainty information
could be added without significant alterations and where a new technique was required,
an extension for it could be created. Due to the propagation of the uncertainty details,
the spreadsheet provided more information than its traditional counterparts. This en-
ables better informed support tools, such as the probabilistic break-even analysis.
9.4 Case Study: Business Planning 167
Figure 9.16: Probabilistic Break-even Visualization
168 Chapter 9. Evaluation
CHAPTER 10
Conclusion and Future Work
“It is not certain that everything is uncertain.”
– Blaise Pascal (Pensees, 1670)
10.1 AchievementsIn this thesis we have introduced an integrated information uncertainty modeling and
visualization system, which intrinsically supports information uncertainty modeling,
automated uncertainty propagation, and uncertainty model abstracted visualization.
This significantly lowers the barrier to the uptake of information uncertainty modeling
and visualization.
There are three significant benefits. Firstly, users are able to build their data model
using whichever uncertainty modeling techniques are appropriate. They are no longer
limited by prior modeling choices, enabling them to capture a greater amount informa-
tion, and therefore improving the fidelity of the data model. Secondly, the system is
intrinsically aware of the uncertainty, protecting it against user induced errors. Users
do not require detailed comprehension of the uncertainty mechanics in order to use a
modeling technique, since these are handled automatically. Lastly, visualization tech-
niques have been abstracted from the underlying uncertainty data type, enabling them
to continue to function when the uncertainty modeling technique changes. This enables
users to create visualizations in the face of changing uncertainty.
We extended the spreadsheet paradigm to intrinsically support information uncer-
tainty. We take a two-fold approach to this: encapsulation of uncertainty information
170 Chapter 10. Conclusion and Future Work
and abstraction of uncertainty for visualization.
Encapsulation
In this thesis, we argued for the encapsulation of information and its uncertainty, where
a variable and its uncertainty are treated as a unit. We devised the uncertainty prop-
agation model, which enables automated propagation of information uncertainty. By
using encapsulation, uncertainty details are treated at the sub-variable level and can be
added, changed, or removed at any point in the process. This means that users do not
have to anticipate their uncertainty requirements before building the data model, and
the uncertainty information can be easily revised as more becomes known.
Abstraction
We designed uncertainty abstraction models, which form the interface between model-
ing techniques and visual mappings. By employing an uncertainty abstraction model,
visualization techniques can be developed for abstract plural values, freeing them from
dependence on any particular data type. To guide users in the selection of appropri-
ate visual mappings, we defined user-objectives for uncertainty visualization. User-
objectives provide a data type abstracted means of describing visualizations and we
explored a user interface for selecting a user-objective.
10.2 LimitationsOur integrated information uncertainty modeling and visualization system is designed
for a particular type of uncertainty: that which can in some way described for a partic-
ular unit of information. This is a fundamental limitation, as our approach seamlessly
integrated information with its uncertainty. Any uncertainty that cannot be so defined
is therefore unsuitable for our approach.
10.3 Possible Applications and ExtensionsThe spreadsheet system presented here can be most readily applied in a fields where
spreadsheets and information uncertainty are both already in common usage. For ex-
ample, the financial markets, marketing, and planning fields are natural targets. It
would be worth exploring the development of our approach as an add-on to existing
commercial spreadsheet software, as this would enable users to leverage their existing
systems.
Many other fields make use of information uncertainty for a variety of tasks, such
as planning and predicting. In many instances custom software or script-based math-
ematical software is used (e.g. MatLab). These users suffer the same drawbacks as
10.3 Possible Applications and Extensions 171
traditional spreadsheet users do and would similarly benefit from our framework. Fur-
ther work might consider exploring the encapsulation and abstraction approaches for
such custom and script-based systems.
The visual mapping of uncertainty information is done using the formula language.
Future work could explore user interfaces to aid the visual mapping process. The user
interface design for selecting user-objectives would provide a good starting point. How
can this be expanded to effectively aid the user to construct useful visualizations? Can
the system learn user preferences and thereby improve the options shown?
The prototype system relied on a text entry mechanism for the user to specify the
uncertainty and information for a particular variable. New interfaces might be de-
veloped to explore more intuitive ways of eliciting uncertainty information from the
user. For example, a graphical interface that enables users to draw a fuzzy membership
function may be appropriate.
Interaction with visualizations has previously been shown to be useful. For exam-
ple, brushing is commonly available even in graphs produced by traditional spread-
sheet systems. As comments from the survey indicated, interactivity is a feature that
users value, particularly when it comes to uncertainty. Further work should be carried
out to investigate how interaction can be incorporated into the current framework. For
example, is it appropriate to set the value of a cell elsewhere in the spreadsheet, as
per [65]?
Abstraction from the uncertainty information for visualization opens an opportu-
nity for research into visualization techniques using this approach. For example, how
might a pie chart or a tree map be extended to incorporate plural values?
172 Chapter 10. Conclusion and Future Work
Bibliography
[1] Ken Arnold, James Gosling, and David Holmes. The Java programming lan-
guage. Addison-Wesley, 4th ed. edition, 2006.
[2] Bilal M. Ayyub and George J. Klir. Uncertainty modeling and analysis in engi-
neering and the sciences. Chapman & Hall/CRC„ Boca Raton, FL :, 2006.
[3] Humberto Barreto and Frank Howland. Introductory Econometrics: Using
Monte Carlo Simulation with Microsoft Excel. Cambridge University Press,
2006.
[4] L. Bavoil, S. P. Callahan, P. J. Crossno, J. Freire, C. E. Scheidegger, C. T. Silva,
and H. T. Vo. Vistrails: enabling interactive multiple-view visualizations. In
Visualization, 2005. VIS 05. IEEE, pages 135–142, 2005.
[5] M. Berthold and L. Hall. Visualizing fuzzy points in parallel coordinates. IEEE
Transactions on Fuzzy Systems, 11:369–374, June 2003.
[6] M. R. Berthold and R. Holve. Visualizing high dimensional fuzzy rules. pages
64–68, 2000.
[7] Edward Blackwell. How to prepare a business plan. Kogan Page, 2004.
[8] Grady Booch, James Rumbaugh, and Ivar Jacobson. The unified modeling lan-
guage user guide. Addison-Wesley, 1999.
[9] Polly S. Brown and John D. Gould. An experimental study of people creating
spreadsheets. ACM Transactions on Office Information Systems, 5(3):258–272,
July 1987.
174 BIBLIOGRAPHY
[10] R. Brown. Animated visual vibrations as an uncertainty visualization technique.
In International Conference on Graphics and Interactive Techniques in Aus-
tralasia and South East Asia, pages 84–89. ACM, June 2004.
[11] R. Brown and B. Pham. Visualisation of fuzzy decision support information: A
case study. In IEEE International Conference on Fuzzy Systems, pages 601–606,
St Louis, 2003.
[12] M. M. Burnett, M. J. Baker, C. Bohus, P. Carlson, S. Yang, and P. Van Zee.
Scaling up visual programming languages. Computer, 28(3):45–54, 1995.
[13] Steven P. Callahan, Juliana Freire, Emanuele Santos, Carlos E. Scheidegger,
Claudio T. Silva, and Huy T. Vo. Vistrails: visualization meets data manage-
ment. In SIGMOD ’06: Proceedings of the 2006 ACM SIGMOD International
Conference on Management of Data, pages 745–747, New York, NY, USA,
2006. ACM Press.
[14] Alexander Campbell, Erik Berglund, and Alexander Streit. Graphics hardware
implementation of the parameter-less self-organising map. Intelligent Data En-
gineering and Automated Learning - IDEAL 2005, pages 343–350, 2005.
[15] S. K. Card and J. Mackinlay. The structure of the information visualization
design space. pages 92–99, 125, 1997.
[16] E. Chi. A taxonomy of visualization techniques using the data state reference
model. In IEEE Symposium on Information Visualization, pages 69–75. IEEE
Press, Oct 2000.
[17] E. H. Chi and S. K. Card. Sensemaking of evolving web sites using visualization
spreadsheets. pages 18–25, 142, 1999.
[18] Ed Huai-Hsin Chi and J. T. Riedl. An operator interaction framework for visu-
alization systems. pages 63–70, 1998.
[19] Helen Couclelis. The certainty of uncertainty: Gis and the limits of geographic
knowledge. Transactions in GIS, 7(2):165–175, 2003.
[20] Z. Cox, J.A. Dickerson, and D. Cook. Visualizing membership in multiple clus-
ters after fuzzy c-means clustering. In R. Erbacher, P. Chen, J. Roberts, C. Wit-
tenbrink, and M. Grohn, editors, Visual Data Exploration and Analysis VIII,
volume 4302, pages 60–68. SPIE Bellingham, Washington, 2001.
[21] I. Cruz. Tailorable information visualization. ACM Computing Surveys, 28(4),
December 1996.
BIBLIOGRAPHY 175
[22] David de Vaus. Analyzing social science data. SAGE, 2002.
[23] J.A. Dickerson, Z. Cox, E.S. Wurtele, and A.W. Fulmer. Creating metabolic
and regulatory network models using fuzzy cognitive maps. In M.H. Smith,
W.A. Gruver, and L.O. Hall, editors, IFSA World Congress and 20th NAFIPS
International Conference, 2001. Joint 9th, volume 4, pages 2171–2176, Dept.of
Electr. Eng, Iowa State Univ., Ames, IA, USA, 2001. Practical.
[24] S. Djurcilov, K. Kim, P. Lermusiaux, and A. Pang. Volume rendering data with
uncertainty information. In D. Ebert, J.M. Favre, and R. Peikert, editors, Data
Visualization 2001, pages 243–252, 355–356. Springer, 2001.
[25] S. Djurcilov, K. Kim, P. Lermusiaux, and A. Pang. Visualizing scalar volumetric
data with uncertainty. Elsevier Computers & Graphics, 26:239–248, 2002.
[26] Mark Dodge and Craig Stinson. Microsoft Office Excel 2007 : inside out. Mi-
crosoft Press, 2007.
[27] D. J. Duke, K. W. Brodlie, and D. A. Duce. Building an ontology of visualiza-
tion. pages 7–7, 2004.
[28] David H. Eberly. 3D game engine design : a practical approach to real-time
computer graphics. Morgan Kaufmann, 2001.
[29] EURACHEM and CITAC Measurement Uncertainty Working Group. Quanti-
fying Uncertainty in Analytical Measurement. Eurachem, 2nd edition edition,
2000.
[30] Marc II Fisher, Gregg Rothermel, Darren Brown, Mingming Cao, Curtis Cook,
and Margaret Burnett. Integrating automated test generation into the WYSI-
WYT spreadsheet testing methodology. ACM Trans. Softw. Eng. Methodol.,
15(2):150–194, 2006.
[31] P. Fisher. Visualizing uncertainty in soil maps by animation. Cartographica,
30(2+3):20–27, 1993.
[32] Y. Fujiwara, M. Shirashi, D. Nakagawa, and S. Okada. Visualization of the
rule-based program by a 3d flowchart. In 6th International Conference on Fuzzy
Theory and Technology (JCIS), volume 3, pages 250–254, NC, USA, Oct 1998.
[33] N. Gershon. Visualization of an imperfect world. Computer Graphics and
Applications, IEEE, 18(4):43–45, 1998.
176 BIBLIOGRAPHY
[34] N. D. Gershon. Visualization of fuzzy data using generalized animation. pages
268–273, 1992.
[35] John Geweke, William McCausland, and John Stevens. Using Simulation Meth-
ods for Bayesian Econometric Models, chapter 8. Marcel Dekker, Inc., 2003.
[36] M.F. Goodchild, D.R. Montello, P. Fohl, and J. Gottsegen. Fuzzy spatial queries
in digital spatial data libraries. In IEEE World Congress on Computational
Intelligence Fuzzy Systems Proceedings, volume 1, pages 205 – 210, 4-9 May
1998.
[37] G. Grigoryan and P. Rheingans. Probabilistic surfaces: point based primitives to
show surface uncertainty. In Visualization, 2002. VIS 2002. IEEE, pages 147–
153, 2002.
[38] G. Grigoryan and P. Rheingans. Point-based probabilistic surfaces to show sur-
face uncertainty. Visualization and Computer Graphics, IEEE Transactions on,
10(5):564–573, 2004.
[39] L. O. Hall and M. R. Berthold. Fuzzy parallel coordinates. pages 74–78, 2000.
[40] Charles D. Hansen and Chris Johnson. Visualization Handbook. Academic
Press, December 2004.
[41] S. Henderson. Vised: Visualization techniques. Retrieved 25 June 2004 from
����������������� �������� ������������������������������� ����������, 1996.
[42] T. Hengl. Visualisation of uncertainty using the hsi colour model: computations
with colours. In 7th International Conference on GeoComputation, page 8,
2003.
[43] Hugues Hoppe. Progressive meshes. In SIGGRAPH ’96: Proceedings of
the 23rd annual conference on Computer graphics and interactive techniques,
pages 99–108, New York, NY, USA, 1996. ACM Press.
[44] D. Howard and A. M. MacEachren. Interface design for geographic visualiza-
tion: Tools for representing reliability. Cartography and Geographic Informa-
tion Systems, 23(2):59–77, 1996.
[45] Ed Huai hsin Chi, Joseph Konstan, Phillip Barry, and John Riedl. A spreadsheet
approach to information visualization. In UIST ’97: Proceedings of the 10th
Annual ACM Symposium on User Interface Software and Technology, pages
79–80, New York, NY, USA, 1997. ACM Press.
BIBLIOGRAPHY 177
[46] Ed Huai hsin Chi, John Riedl, Phillip Barry, and Joseph Konstan. Principles for
information visualization spreadsheets. IEEE Computer Graphics and Applica-
tions, 18(4):30–38, 1998.
[47] G.J. Hunter and M.F. Goodchild. Managing uncertainty in spatial databases:
Putting theory into practice. Journal of the Urban and Regional Information
Systems Association, 5(2):55–62, 1993.
[48] Tomas Isakowitz, Shimon Schocken, and Jr. Henry C. Lucas. Toward a logi-
cal/physical theory of spreadsheet modeling. ACM Trans. Inf. Syst., 13(1):1–37,
1995.
[49] T. J. Jankun-Kelly, K.-L. Ma, and M. Gertz. A model for the visualization
exploration process. In IEEE Visualization, pages 323–330, 2002.
[50] J. L. Soh Y. C. Jiang, B. Wang. Robust fault diagnosis for a class of bilinear sys-
tems with uncertainty. In IEEE Conference on Decision and Control, volume 5,
pages 4499–4504. IEEE, 1999.
[51] C.R. Johnson and A.R. Sanderson. A next step: Visualizing errors and un-
certainty. IEEE Computer Graphics and Applications, 23(5):6–10, Septem-
ber/October 2003.
[52] D. Kao, J. Dungan, and A. Pang. Visualizing 2d probability distributions from
eos satellite image-derived data sets: A case study. In IEEE Visualization, pages
457–460, 2001.
[53] David Kao, Alison Luo, Jennifer L. Dungan, and Alex Pang. Visualizing spa-
tially varying distribution data. iv, 00:219, 2002.
[54] David L. Kao, Marc G. Kramer, Alison L. Love, Jennifer L. Dungan, and Alex T.
Pang. Visualizing distributions from multi-return lidar data to understand forest
structure. Cartographic Journal, The, 42:35–47(13), June 2005.
[55] A. Keller. Fuzzy clustering with outliers. pages 143–147, 2000.
[56] P. Keller and M. Keller. Visual Cues. IEEE Press, 1992.
[57] B. Kitchenham, L.M. Pickard, S. Linkman, and P.W. Jones. Modeling soft-
ware bidding risks. IEEE Transactions on Software Engineering, 29(6):542–
554, June 2003.
[58] George Klir and Richard Smith. On measuring uncertainty and uncertainty-
based information: Recent developments. Annals of Mathematics and Artificial
Intelligence, 32(1):5–33, 2001.
178 BIBLIOGRAPHY
[59] George J. Klir. Generalized information theory: aims, results, and open prob-
lems. Reliability Engineering & System Safety, 85(1-3):21–38, 2004.
[60] George J. Klir. Uncertainty and Information: Foundations of Generalized In-
formation Theory. Wiley-Interscience, 2005.
[61] G.J. Klir. An update on generalized information theory. In Third International
Symposium on Imprecise Probabilities and Their Applications, number 18 in
Proceedings in Informatics, Lugano, Switzerland, 2003. Carleton Scientific.
[62] G.J. Klir and D. Harmanec. Generalized information theory: recent develop-
ments. Kybernetes, 25(7/8):50–67, 1996.
[63] G.J. Klir and M.J. Wierman. Uncertainty-Based Information: Elements of Gen-
eralized Information Theory. Springer, 1999.
[64] R. Kosara, S. Miksch, and H. Hauser. Focus+context taken literally. IEEE
COMPUTER GRAPHICS AND APPLICATIONS, 22:22–29, 2002.
[65] Marc Levoy. Spreadsheets for images. In SIGGRAPH ’94: Proceedings of
the 21st Annual Conference on Computer Graphics and Interactive Techniques,
pages 139–146, New York, NY, USA, 1994. ACM Press.
[66] S. K. Lodha, C. M. Wilson, and R. E. Sheehan. Listen: Sounding uncertainty
visualization. In Visualization, pages 189–95, San Francisco, California, 1996.
IEEE.
[67] S.K. Lodha, A. Pang, R.E. Sheehan, and C.M. Wittenbrink. Uflow: Visualizing
uncertainty in fluid flow. IEEE Visualization’96, pages 249–254, 1996.
[68] Suresh K. Lodha, Bob Sheehan, Alex T. Pang, and Craig M. Wittenbrink. Vi-
sualizing geometric uncertainty of surface interpolants. In Wayne A. Davis
and Richard Bartels, editors, Graphics Interface ’96, pages 238–245. Canadian
Human-Computer Communications Society, 1996.
[69] Adriano Lopes and Ken Brodlie. Mathematical Visualization, chapter Accuracy
in 3D Particle Tracing, pages 329–341. Springer-Verlag, Heidelberg, 1998.
[70] A. Lowe, R. Jones, and M. Harrison. The graphical presentation of decision
support information in an intelligent anaesthesia monitor. Artificial Intelligence
in Medicine, 22:173–191, 2001.
[71] Jan Łukasiewicz. Elements of Mathematical Logic. Translated from Polish by
Olgierd Wojtasiewicz. Macmillan, New York, 1964.
BIBLIOGRAPHY 179
[72] Alison Luo, David Kao, and Alex Pang. Visualizing spatial distribution data
sets. In VISSYM ’03: Proceedings of the symposium on Data visualisation
2003, pages 29–38, Aire-la-Ville, Switzerland, Switzerland, 2003. Eurographics
Association.
[73] K.-L. Ma. Visualization - a quickly emerging field. ACM Computer Graphics,
February:4–7, 2004.
[74] Alan M. MacEachren, Anthony Robinson, Susan Hopper, Steven Gardner,
Robert Murray, Mark Gahegan, and Elisabeth Hetzle. Visualizing geospatial
information uncertainty: What we know and what we need to know. Cartogra-
phy and Geographic Information Science, 32(3):139–160, July 2005.
[75] The MathWorks. Fuzzy Logic Toolbox For Use with MATLAB: User’s Guide
ver. 2. The MathWorks, Inc., 1999.
[76] Don L. McLeish. Monte Carlo Simulation and Finance. John Wiley & Sons,
Inc., 2005.
[77] J. M. Mendel. Uncertain Rule-Based Fuzzy Logic Systems. Prentice Hall PTR,
2001.
[78] Microsoft. Description of the stdev function in excel 2003. Help and Support
Article 826349, January 2007.
[79] Hung T. Nguyen and E. Walker. A first course in fuzzy logic. Chapman & Hall,
Boca Raton, FL, 2000.
[80] A. Nurnberger, A. Klose, and R. Kruse. Discussing cluster shapes of fuzzy
classifiers. In 18th International Conference of the North American Fuzzy In-
formation Processing Society, pages 546–550, July 1999.
[81] A. Nurnberger, A. Klose, and R. Kruse. Effects of antecedent pruning in
fuzzy classification systems. In Proceedings of 4th International Conference
on Knowledge-Based Intelligent Engineering Systems & Allied Technologies,
pages 154–157, 2000.
[82] J. Ohene-Djan, A. Sammon, and R. Shipsey. Colour spectrums of opinion: An
information visualisation interface for representing degrees of emotion in real
time. In Information Visualization, pages 80–88, July 2006.
[83] C. Olston and JD Mackinlay. Visualizing data with bounded uncertainty. Infor-
mation Visualization, 2002. INFOVIS 2002. IEEE Symposium on, pages 37–40,
2002.
180 BIBLIOGRAPHY
[84] A. Pang and N. Alper. Bump mapped vector fields. 1995.
[85] Alex Pang and Adam Freeman. Methods for comparing 3d surface attributes.
volume 2656, pages 58–64. SPIE, 1996.
[86] Alex T. Pang, Craig M. Wittenbrink, and Suresh K. Lodha. Approaches to
uncertainty visualization. The Visual Computer, 13:370–390, 1997.
[87] R. R. Panko. Two corpuses of spreadsheet errors. pages 8 pp. vol.1–, 2000.
[88] R. R. Panko and Jr. Halverson, R. P. Individual and group spreadsheet design:
patterns of errors. volume 4, pages 4–10, 1994.
[89] Zdzisaw Pawlak. Rough sets : theoretical aspects of reasoning about data.
Kluwer Academic Publishers, Dordrecht & Boston, 1991.
[90] Judea Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann,
1988.
[91] B. Pham and R. Brown. Analysis of visualisation requirements for fuzzy sys-
tems. In Proceedings of the 1st International Conference on Computer Graphics
and Interactive Techniques in Austalasia and South East Asia, pages 181–187,
2003.
[92] B. Pham, A. Streit, and R Brown. Interactive Visualisation: A State-of-the-Art
Survey, chapter Visualisation of Information Uncertainty: Progress and Chal-
lenges. Springer, UK, 2007.
[93] Binh Pham and Ross Brown. Multi-agent approach for visualisation of fuzzy
systems. In International Conference on Computational Science. Lecture Notes
in Computer Science, June 2-4 2003.
[94] Binh Pham and Ross Brown. Visualisation of fuzzy systems: requirements,
techniques and framework. Future Generation Computer Systems, 21(7):1199–
1212, 2005.
[95] Kurt W. Piersol. Object-oriented spreadsheets: the analytic spreadsheet pack-
age. In OOPLSA ’86: Conference Proceedings on Object-oriented Program-
ming Systems, Languages and Applications, pages 385–390, New York, NY,
USA, 1986. ACM Press.
[96] Lech Polkowski. Rough Sets: Mathematical Foundations. Physica-Verlag,
2002.
BIBLIOGRAPHY 181
[97] Bruno R. Preiss. Data structures and algorithms : with object-oriented design
patterns in C++. Wiley, 1999.
[98] Ramana Rao and Stuart K. Card. The table lens: merging graphical and sym-
bolic representations in an interactive focus + context visualization for tabular
information. In CHI ’94: Proceedings of the SIGCHI Conference on Human
Factors in Computing Systems, pages 318–322, New York, NY, USA, 1994.
ACM Press.
[99] M. Reed and D. Heller. Olive: Online library of information visualization envi-
ronments. Retrieved 15 May 2004 from ���������������� �� �������,
1997.
[100] T. Reinhardt and N. Pillay. Analysis of spreadsheet errors made by computer
literacy students. pages 852–853, 2004.
[101] L. Reznik and B. Pham. Fuzzy models in evaluation of information uncertainty
in engineering and technology applications. volume 2, pages 972–975 vol.3,
2001.
[102] George G. Robertson, Jock D. Mackinlay, and Stuart K. Card. Cone trees:
animated 3d visualizations of hierarchical information. In Proceedings of the
SIGCHI conference on Human factors in computing systems: Reaching through
technology, New Orleans, Louisiana, United States, 1991. ACM Press.
[103] Daniel M. Russell, Mark J. Stefik, Peter Pirolli, and Stuart K. Card. The cost
structure of sensemaking. In CHI ’93: Proceedings of the SIGCHI conference
on Human factors in computing systems, pages 269–276, New York, NY, USA,
1993. ACM Press.
[104] A.R. Sanderson, C.R. Johnson, and R.M. Kirby. Display of vector fields using
a reaction-diffusion model. In Visualization, 2004. IEEE, pages 115–122, 2004.
[105] Will Schroeder, Ken Martin, and Bill Lorensen. The Visualization Toolkit, Third
Edition. Kitware Inc., 2004.
[106] Hikmet Senay and Eve Ignatius. Rules and Principles of Scientific Data Visu-
alization. Washington, D.C.: Institute for Information Science and Technology,
Dept. of Electrical Engineering and Computer Science, School of Engineering
and Applied Science, George Washington University, 1990.
[107] G. Shaefer. Perspectives on the theory and practice of belief functions. Interna-
tional Journal of Approximate Reasoning, 3:1–40, 1990.
182 BIBLIOGRAPHY
[108] B. Shneiderman. The eyes have it: a task by data type taxonomy for information
visualizations. In IEEE Symposium on Visual Languages, pages 336–343. IEEE
Press, Sep 1996.
[109] T.A. Slocum, D.C. Cliburn, J.J. Feddema, and J.R. Miller. Evaluating the usabil-
ity of a tool for visualizing the uncertainty of the future global water balance.
Cartography and Geographic Information Science, 30(4):299–318, 2003.
[110] Alexander Streit, Binh Pham, and Ross Brown. Visualization support for man-
aging large business process specifications. In Proceedings 3rd International
Conference, Business Process Management, BPM2005, pages 206–219, Nancy,
France, September 2005. Springer Verlag.
[111] Randall J. Swift and Malempati Madhusudana Rao. Probability Theory with
Applications. Springer, revised edition edition, 2006.
[112] Barry N. Taylor and Chris E. Kuyatt. Guidelines for evaluating and express-
ing the uncetainty of NIST measurement results (1994 edition). Technical
Note 1297, National Institute of Standards and Technology, Gaithersburg, MD,
September 1994.
[113] A Thomas. Visualisation and Modelling, chapter Contouring Algorithms for
Visualisation and Shape Modelling Systems, pages 99–175. Academic Press,
San Diego, USA, 1997.
[114] J. Thomson, E. Hetzler, A. MacEachren, M. Gahegan, and M. Pavel. A typology
for visualizing uncertainty. Proc. SPIE, 5669:146–157, 2005.
[115] M. Tory and T. Möller. Rethinking visualization: A high-level taxonomy. In
IEEE Symposium on Information Visualization, pages 151–158. IEEE Press,
Oct 2004.
[116] A. Treisman and G. Gelade. A feature-integration theory of attention. Cognitive
Psychology, 12:97–136, 1980.
[117] E. Tufte. The Visual Display of Quantitative Information. Graphics Press,
Cheshire, USA, 1983.
[118] Edward R. Tufte. Envisioning Information. Graphics Press, May 1990.
[119] Wil van der Aalst. Business process management demystified: A tutorial on
models, systems and standards for workflow management. In Lectures on Con-
currency and Petri Nets, volume 3098, pages 1–65. Springer Verlag, Berlin,
2004.
BIBLIOGRAPHY 183
[120] Wil van der Aalst and Arthur ter Hofstede. Yawl: Yet another workflow lan-
guage. In Information Systems, volume 30, June 2005.
[121] Wil M.P. van der Aalst, Arthur H.M. ter Hofstede, and Mathias Weske. Business
process management: A survey. pages 1019–1019, 2003.
[122] Jarke J. van Wijk and Robert van Liere. Hyperslice: visualization of scalar
functions of many variables. In VIS ’93: Proceedings of the 4th Conference on
Visualization ’93, pages 119–125, 1993.
[123] A. Varshney and A. Kaufman. Finesse: a financial information spreadsheet. In
INFOVIS ’96: Proceedings of the 1996 IEEE Symposium on Information Visual-
ization (INFOVIS ’96), page 70, Washington, DC, USA, 1996. IEEE Computer
Society.
[124] B. Wandell. Foundations of Human Vision. Sinauer, Sunderland, USA, 1995.
[125] C. Wittenbrink, A. Pang, and S. Lodha. Glyphs for visualizing uncertainty
in vector fields. IEEE Transactions on Visualization and Computer Graphics,
2(3):266–279, 1996.
[126] L. Zhou and A. Pang. Metrics and visualization tools for surface mesh compari-
son. Technical report, Computer Science Department, University of California,
Santa Cruz, 2000.
184 BIBLIOGRAPHY
APPENDIX A
First Survey
This is the first survey, which was sent out to particpants from a variety of fields.
186 Chapter A. First Survey
Projecting Sales A sales projection scenario is given: Sales (estimated) is the projected sales Expenses is composed of two parts: fixed expenses and sales expenses (a percentage of sales). The net income is calculated from sales – expenses.
The images below show the scenario using a spreadsheet. There are four quarters. The field labeled “rate” is the sales expenses rate – a percentage of sales.
Now we want a sensitivity analysis on net income based on the sales expenses (rate field). We wish to investigate the difference 5 percentage points can make to net income. Only the rate field was changed. The other cells changed automatically, since they are calculated from formulae.
Illustration 1: Typical spreadsheet for the scenario
Illustration 2: Typical line graph
187
To further the example, we wish to investigate the impact that variance in sales might have. We change each sale into an interval:
Illustration 3: Rate is an interval: 15 + or - 5%
Illustration 4: Net income is an interval
Illustration 5: Sales and Sales Expenses can vary
188 Chapter A. First Survey
The graph shows the increased variance. In the previous examples we have used that variance format (base +- variance), but we can also define intervals using lower and upper bounds. For example:
The graph for this is identical to Illustration 6.
QuestionsThe above example shows a simple sensitivity analysis scenario. I would be very appreciative if you could answer the following subjective questions.
1. Did the interval approach make sense to you? Was it intuitive enough to look at the screenshots and follow the logic?
2. Would you consider this a more useful tool than a standard spreadsheet? For the purpose of sensitivity analysis.
3. Would you like to use a tool like this? Why / Why not 4. What would you change?
(optional) 5. Do you have any other comments about what you have seen?
Illustration 6: Variance is greater when both sales and sales expenses can vary
Illustration 7: Sales are in "lower and upper bounds" format
APPENDIX B
Second Survey
This is the second survey, in which financial experts were asked to complete tasks
using our software. The participants were monitored and were offered assistance when
requested.
190 Chapter B. Second Survey
IvySheet Survey
Entrance Questionnaire
NAME
Please rate your experience in the following areas:
Spreadsheet Software (e.g. Microsoft Excel)
Visualization Systems (e.g. Spotfire, AVS)
Uncertainty Modeling (e.g. statistical methods)
Financial Modeling (e.g. forecasting, sensitivity analysis)
None Beginner Intermediate Advanced
None Beginner Intermediate Advanced
None Beginner Intermediate Advanced
None Beginner Intermediate Advanced
191
IvySheet Survey
Task Description
Thank you for taking part in this survey. The purpose of this survey is to evaluate the approaches used to create the IvySheet software package. IvySheet is a spreadsheet system that lets you enter uncertainty information about variables. For example, a variable can be modeled as an interval or a normal probability distribution.
Today you will be asked to create a spreadsheet and graphs for a sales sensitivity scenario. All of the information you need is provided for you below. Some statistics about your usage will be recorded and at the end you will be asked questions about your experience. You may request assistance at any time.
The Task A sales projection scenario is given:
Sales (estimated) are the projected gross sales. Fixed expenses are overheads incurred. Sales expenses are the variable costs incurred. The rate is currently 15% of gross sales, but this will change. Net income is calculated from sales – fixed expenses – variable expenses. There are four quarters.
The data follow:
Rate 15%
Quarter Sales Fixed expensesQ1 1200 500Q2 1000 500Q3 1500 500Q4 1800 500
Step 1 Create a spreadsheet for the scenario.
Step 2 Create a line graph for the net income. To do this:
1. Navigate to an empty cell in the spreadsheet. 2. Choose “Create Line Graph” from the “Visualization” menu. You can
optionally enter text in the title and axis label fields. 3. In the data field, enter the range of cells. For example, “D4:D8”.
192 Chapter B. Second Survey
Step 3 Navigate to the rate field and type “[5..10]”. This has changed the rate into an interval of values between 5 and 10. Intervals can be entered into any cell using the format [number..number].
Please answer the following questions, making whatever adjustments to the spreadsheet you deem necessary.
1. What are the minimum and maximum amounts for net income in Q4 if the rate is between 10 and 20%?
2. What is the minimum net income for Q3 if the rate is between 10 and 20% AND the sales for that quarter are between 1000 and 2000?
3. What are the minimum and maximum amounts of net income for Q3, if the rate is between 15 and 18%, the sales are between 1500 and 1800, and the fixed expenses are between 500 and 800?
193
IvySheet Survey
Questions
NAME
Please rate how much you agree with the following statements. For each statement, tick one box only. There is space for you to write comments at the end.
Q1. Overall, I found the system to be intuitive.
Q2. Placing an interval in a cell made sense to me
Q3. I would need more training to complete the tasks
Q4. I found it easy to follow the effect that an interval had on the rest of the spreadsheet
Strongly Disagree
Disagree Neither Agree nor Disagree
Agree Strongly Agree
Strongly Disagree
Disagree Neither Agree nor Disagree
Agree Strongly Agree
Strongly Disagree
Disagree Neither Agree nor Disagree
Agree Strongly Agree
Strongly Disagree
Disagree Neither Agree nor Disagree
Agree Strongly Agree
194 Chapter B. Second Survey
Q5. The graph helped me to understand the effects better
Q6. I would definitely use the graph if there were many numbers
Q7. The spreadsheet became too cluttered or hard to read
Q8. This tool makes it easier to make mistakes
Q9. I would use a tool like this if it were available to me
Q10. I would use intervals in Microsoft Excel if they were supported
Strongly Disagree
Disagree Neither Agree nor Disagree
Agree Strongly Agree
Strongly Disagree
Disagree Neither Agree nor Disagree
Agree Strongly Agree
Strongly Disagree
Disagree Neither Agree nor Disagree
Agree Strongly Agree
Strongly Disagree
Disagree Neither Agree nor Disagree
Agree Strongly Agree
Strongly Disagree
Disagree Neither Agree nor Disagree
Agree Strongly Agree
Strongly Disagree
Disagree Neither Agree nor Disagree
Agree Strongly Agree
195
Q11. My current tools are adequate for modeling and visualizing uncertainty
Q12. I would use normal probability distributions more often in my work if I had a tool like this.
Q13. I would model uncertainty more often if I had these features available
CommentsPlease feel free to write any comments below.
Strongly Disagree
Disagree Neither Agree nor Disagree
Agree Strongly Agree
Strongly Disagree
Disagree Neither Agree nor Disagree
Agree Strongly Agree
Strongly Disagree
Disagree Neither Agree nor Disagree
Agree Strongly Agree
196 Chapter B. Second Survey
IvySheet Survey
Observations Sheet [not to be shown to participants]
NAME
Step 1 Number of errors made:
Number of corrections made:
Number of calls for help:
Time taken:
Step 2 Number of attempts made:
Number of errors made:
Number of corrections made:
Number of calls for help:
Time taken:
Step 3 Answer 1: (Should be 940 and 1120)
Answer 2: (Should be 300)
Answer 3: (Should be 430 and 1030)
Did they use intervals to get all three answers?
Number of edits made:
Number of errors made:
Number of corrections made:
Number of calls for help:
Time taken:
197
IvySheet SurveyResults
ParticipantsEntrance RS KF JH Average StddevSpreadsheets 4 4 3 3.67 0.58Visualization 2 2 1 1.67 0.58Uncertainty 3 3 2 2.67 0.58Financials 4 4 3 3.67 0.58
Question1 5 5 4 4.67 0.582 5 4 4 4.33 0.583 1 1 1 1.00 0.004 4 4 4 4.00 0.005 4 3 4 3.67 0.586 5 4 5 4.67 0.587 1 2 2 1.67 0.588 1 1 1 1.00 0.009 5 4 5 4.67 0.58
10 5 5 5 5.00 0.0011 1 3 2 2.00 1.0012 5 5 4 4.67 0.5813 5 4 4 4.33 0.58
PerformanceStep 1errors 0 1 0 0.33 0.58corrections 4 2 1 2.33 1.53help 0 0 0 0.00 0.00time (m) 6 7 11 8.00 2.65Step 2attempts 1 1 1 1.00 0.00errors 0 0 0 0.00 0.00corrections 0 0 0 0.00 0.00help 0 0 1 0.33 0.58time (m) 1 1 1 1.00 0.00Step 3Ans1 right y y yAns2 right y y yAns3 right y y yintervals y y yedits 4 3 3 3.33 0.58errors 0 0 0 0.00 0.00corrections 1 0 0 0.33 0.58help 0 0 0 0.00 0.00time (m) 3 4 5 4.00 1.00