Eindhoven University of Technology
MASTER
Development of a questionnaire for evaluating personal informatics visualizations
Paulussen, K.
Award date:2018
Link to publication
DisclaimerThis document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Studenttheses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the documentas presented in the repository. The required complexity or quality of research of student theses may vary by program, and the requiredminimum study period may vary in duration.
General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain
Eindhoven, 11-07-2018
identity number 0870509
in partial fulfilment of the requirements for the degree of
Master of Science
in Human-Technology Interaction
Supervisors:
dr. E.T. Kersten - van Dijk MSc
dr.ir. P.A.M. Ruijten
Development of a Questionnaire for
Evaluating Personal Informatics
Visualizations
by Karsten Paulussen
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 2
Contents
Abstract 3
1 Introduction 4
1.1 Quantified Self and Personal Informatics 4
1.2 Models of Personal Informatics 5
1.3 From Information Visualizations to Personal Visualizations 7
1.4 What is Insight 8
1.5 Which Insight types exist 9
1.6 Data Visualizations 10
1.7 Evaluating Personal Visualizations 13
1.8 Need for a New Evaluation Methodology 16
2 Study I: Creating the Questionnaire 17
2.1 Introduction 17
2.2 Methods 17
2.3 Results 19
2.4 Discussion 21
2.5 Conclusion 23
3 Study II: Comparison of the Questionnaire with the Insight-Based Methodology 24
3.1 Introduction 24
3.2 Methods 24
3.3 Results & Discussion 34
3.4 Conclusion 42
4 General Discussion 43
5 Limitations 45
6 Future Research 46
7 Conclusion 47
References 48
Appendices 52
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 3
Abstract
The rise of wearables, mobile applications, and other self-tracking tools make it possible to collect
many kinds of data about ourselves. However, many people stop collecting data after a while. One
reason that these people stop tracking is that the visualizations used in these tools do not offer them
the right insights to gain self-knowledge. To know which insights the users can achieve from the
visualizations, researchers and developers need to be able to evaluate these visualizations. Till now,
the methods used to do these evaluations are all time-consuming and complex to conduct. In this
master thesis an easier and more efficient tool for evaluating visualizations is created in the form of
questionnaire. This questionnaire was created in a first study and contains 24 statements. In a
second study, the results of evaluations with the questionnaire were compared with the results of
one of the more time-consuming methods. We found some similarities and differences between the
results of both evaluation methods. These similarities and differences are discussed and advantages
for using the developed questionnaire as an evaluation tool are given.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 4
1 Introduction
As a result of the recent advances in consumer technologies, more and more people are able to
collect a wide range of personal data. The rise of wearables, smartphone applications, and other
self-tracking tools make it possible to keep track of our sleep pattern, daily exercise, energy use, etc.
Flood (2013) predicted 5 years ago that by now, 485 million wearable computing devices will be in
the market. However, statistics showed that this number was already reached last year (Statista,
2017). People start self-tracking for different reasons, for example to find a relation between
different measurements, to document their physical activities, or to set goals (Rooksby, Rost,
Morrisen, & Chalmers, 2014). Despite of the increasing number of users, one of the biggest concerns
is that people often stop using their devices and stop tracking their data. In an overview of reasons
why people stop tracking by survey almost 200 people (Epstein, Caraway, Johnston, Ping, Fogarty, &
Munson, 2016), one of the main reasons was that the visualizations shown by the tool or application
did not give or show the information the user expected to receive. Users expect to achieve self-
knowledge from their data. This self-knowledge can be obtained by getting insights out of their data.
Therefore, it is important that the proposed visualizations contain these insights. To make sure the
visualizations offer the right insights to the user, it is important that these visualizations are
evaluated and a quantification of insights has occurred (Ellis, & Mansmann, 2010). In this master
thesis, a methodology is developed to give developers and researchers the opportunity to make an
evaluation of the personal informatics visualizations and the corresponding insights people can
achieve from these visualizations.
1.1 Quantified Self and Personal Informatics
The practice of collecting and reflecting of self-tracked data is called personal informatics, personal
analytics or quantified self (Li, Dey, & Forlizzi, 2011). Besides, it is used as a term for referring to the
practice of self-tracking, Quantified Self is also the name of a community (Choe, Lee, Lee, Pratt, &
Kientz, 2014). This community started in Silicon Valley by two technology fanatics, namely Gerry
Wolf and Kevin Kelly. They were both tracking their behavior and building self-monitoring
technology. In 2008, they started the site quantifiedself.com and two years later Wolf gave a pitch at
a TED conference (Marcengo & Rapp, 2014), which can be considered as the start of the Quantified
Self movement. The community became bigger and nowadays it organizes meetings all over the
world where people share the insights they got from their tracked data or companies promote their
self-monitoring devices (About the Quantified Self, 2016). The aim of the rise of the quantified self is
to give people a better understanding of their behavior based on collected data, such that behavior
changes can be made to live happier and healthier lives (Etkin, 2016). Marcengo and Trapp describe
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 5
three basic points that define the QS movement, namely: 1) data collection (behavioral, contextual,
or psychological), 2) the displaying of these data, and 3) cross-linking the data to get insights from it.
All three such that the user can achieve self-knowledge from the data. In the beginning, members of
the QS community were all so-called technology enthusiastic, but with the rapid advancements in
technology all kind of people can easily track personal data by using commercial trackers or mobile
apps (Choe, Lee, Zhu, Riche, & Baur 2017), which means that the three basic points, given before,
should hold for all people who want to collect and reflect on self-tracked data. This shift from
technology enthusiasts to everyday people prompted some challenges in the Quantified Self field,
with as mean challenge: ‘why do people stop self-tracking?’. The field of personal informatics, which
is used as term in academic research, studied many self-tracking technologies and recognized the
value of these technologies in the domain of health and well-being. However, a list of barriers
towards the adoption of these technologies is identified (Li, Dey, & Forlizzi, 2011), which included
unaccepted visualization, poor analyzing skills, or no possibility to cross-link the data. To solve these
barriers towards the adoption of self-tracking technologies, more research in the field of personal
informatics needs to be done. But first it is important to understand the process self-trackers go
through and allocate these barriers in this process.
1.2 Models of Personal Informatics
To better understand the process self-trackers go through and how people use personal informatics,
different models are developed.
The best-known model is the stage-based model from Li, Dey, and Forlizzi (2010). They conducted a
survey with 68 people who collect and reflect on personal information. From the answers, they
define a model that consists of 5 iterative stages and for each of these stages they compiled a list of
barriers (Figure 1).
Figure 1. Five-stage model of personal informatics (adapted from Li et al., 2011)
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 6
First is the preparation stage, this stage is about people’s motivation to start collecting and their
decision what and how they want to collect the personal information. Examples of barriers people
encounter during this stage are: choosing the wrong tool or choosing the wrong data to collect.
Next is the collection stage, during this stage people collect information about themselves. Barriers
in this stage can be tool-, user-, or data-related. E.g. the tool is not accessible at the right moment in
time, the user forgot to record data, or the data rely on subjective estimations.
The third stage in the model, is the integration stage. In this stage the collected data is prepared,
combined, and transformed for the user. The integration phase can be long or short. We speak
about a short stage when synchronization of different data over different devices is going
automatically. Examples of barriers in the integration stage are bad transcribing of the data and
scattered visualizations.
The fourth stage of the model is the reflections stage. In this stage the user reflects on their personal
information. Two types of reflection phases exist. Namely, the discovery phase, in which users try to
find behavior to change and the maintenance phase, in which users try to keep maintenance of the
behavior they changed. Problems in the reflection stage occurs when users have difficulties with
retrieving or understanding information.
The fifth and last stage is the action stage, in this stage the user choose what he will do with the
personal information retrieved. The model has some important properties designer and researcher
should consider, namely: 1) the model is iterative; 2) Problems in an earlier stage affect later stages;
3) Each stage is user-driven, system-driven, or both; 4) personal informatics systems can be uni-
facets or multi-facets. Li, Dey, and Forlezzi (2010) believe that this stage-based model, with the
barriers and properties, can help researchers and developers to find solutions and develop new
approaches.
Such a new approach, based on the stage-based model of Li and colleagues, is developed by Epstein,
Ping, Fogarty, and Munson (2015). They argue that the use of personal informatics devices cannot be
strictly separated into stages and therefore introduced the Lived informatics model, an extended
version of the model from Li et al. (2010).
Epstein and colleagues (2015) argue that people not only start tracking with as goal to change their
behavior, but also with other motivations, like comparing their activities with others, or curiosity.
Therefore, they divide the preparation stage into a deciding stage and a selecting stage. In this
master thesis, we assume that users can have different end-goals, but that the initial goal wherefore
people start tracking is to gain knowledge about themselves based on data.
Next, Epstein et al. (2015) also argue that the collection, integration, and reflection stage are an
ongoing process where the different stages can occur simultaneously. They called this process
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 7
tracking and acting.
Furthermore, Epstein and colleagues (2015) also include the lapsing stage, users arrive at this stage
when they stop using the self-tracking system. Lapsing can start by barriers in the collection,
integration, or reflection stage.
Lastly, they also include the possibility to resume in their model. Users can resume tracking their
data for the same or different reasons as before. We agree with the idea of an ongoing process, for
which in each stage the possibility exists that the user stop-tracking. We described before that there
are different reasons why people stop tracking. For each of these reasons associated barriers can be
described.
It is important that we find solutions to overcome these barriers and give users a maximal ability to
get insights out of their data. In this master thesis we will focus on one specific barrier users
encounter during the integration- and reflection stage, namely that the user is not able to get
insights from the visualizations of the presented data, because it does not contain the information
the user wants to get from it.
1.3 From Information Visualizations to Personal Visualizations
From the end of the 1980s big progress is made in hardware technology, which makes it possible for
companies and researchers to store large amounts of data (Keim, 2002). These collections of large
amounts of data also asked for new ways to mining and presenting this data, which lead to the rise
of the Information Visualization (InfoVis) community (Ellis, & Mansmann, 2010). Robertson,
Czerwinski, Fisher, and Lee (2009) describe information visualization as “The art and science of
representing abstract information in a visual form that enables human users to gain insight through
their perceptual and cognitive capabilities (p. 1)”. Two important aspects of this information
visualization community are: 1) Information visualization is used to discover new insights and
knowledge from the data, and 2) Information visualizations are representations of data that
amplifies human cognition (Chen, 2010). Both the definition of Robertson et al. (2009) and the
important aspects described by Chen refer to the importance of taking into account the human’s
capabilities when designing data visualizations. Therefore, many research in the InfoVis community
is done towards the way humans process and perceive data visualizations. Like for example the
influence Gestalt Principles (cognitive psychology) on information visualizations (Kerren, Stasko,
Fekete, & North, 2008). Different topics challenged by the community nowadays are for example
‘transformation of data to enable analysis’, ‘visual design and aesthethics’, and comparison in
information visualization: models and challenges’ (Kerren, Plaisant, Stasko, 2011).
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 8
Beside doing research to solve the main challenges in information visualizations, members of the
community also developed InfoVis toolkits. These toolkits can be used to create data visualizations
and integrate the findings of the community into the different features of the tool. The best-known
example is Tableau, which can be used by less technically sophisticated and knowledgeable people.
These people got less attention by the InfoVis community in the past, but due to the big process in
technology more and more domains (economics, agriculture, psychologists, etc.) are collecting data
and trying to visualize this data to get insights from it (Few, & Edge, 2008). Take for example a
farmer, in the past he just milks his cows twice a day and at the end of each day he could check the
amount of liters milk he collected. Nowadays, data is collected from each cow separately, like for
example the number of liters it gives, but also the number of steps it took, how much hours of the
day it was eating, etc. Thanks to toolkits like Tableau software, this data is visualized such that
farmers can easily consult this data. However, giving these farmers the possibility to easily consult
the collected data does not mean that they can easily get insights from it. Research was needed to
find out in which way the data had to be visualized such that the farmer could easily understand it
and gain knowledge from it.
This example shows that the shift of data collection from the academic and scientific world to other
domains resulted in several challenges for the InfoVis community. Similarly, new challenges
appeared with the rise of the quantified-self community. With the advances in the consumer
technology, not only scientist or domain experts (like farmers, medicals, or biologists (Sariaya, North,
& Duca, 2005)) have access to data visualizations, but all type of users can use self-trackers to collect
data about themselves. We will refer to this group of users as the general user in the remainder of
this thesis. Another big change lies in the type of data that is collected. With the rise of QS and
Personal Informatics, the data became more personal. Therefore, the type of visualizations in
domain of personal informatics is called personal visualizations. Despite the target user changed and
the data to be visualized got more personal, the main purpose of personal visualizations is similar to
this of information visualizations, namely: giving the user insights into their data. New research
needed, and still need, to be done to find out in which way these general users get insights and what
types of insights they want to get from the data visualizations.
1.4 What is Insight
‘The purpose of visualization is insight not pictures’, this is what Card, Mackinly, and Schneiderman
(1999) conclude in their book ‘reading and information visualization: Using Vision to Think’.
Therefore, it is important to define what insight in the domain of quantified self means, which will
help to better understand in which way the general user gets insights from his data. According to the
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 9
Cambridge Dictionary, Insight can be defined as: “the ability to have a clear, deep, and sometimes
sudden understanding of a complicated problem or situation.” Another definition is given by
plaisant, Fekete, and Grinstein (2008, p. 120), who declare insight as: “a nontrivial discovery about
the data.” Both these definitions are rather wide. A more specific definition can be found in the
domain of information visualization (infoVis) in which they try to give insights into complex data.
Saraiya, North, and Duca (2005, p. 444) defines insight as followed: “An individual observation about
the data by the participants, a unit of discovery.” Another study in the InfoVis domain by North
(2006) suggests that any formal definition of Insight is either too restrictive or too general and
vague, therefore he concluded that it Is better to define five different characteristics of insight to get
a better understanding of insight, namely: complex, deep, qualitative, unexpected, and relevant. In
all those descriptions and definitions, the same idea of something that is gained or discovered from
the data by the user is given. Based on this idea, the following definition of Insight will be used in this
master thesis: “A unit of discovery gained from the data, which can lead to better self-knowledge.”
1.5 Which Insight types exist
Now we know what an insight is, it is important to answer the question “which insights can be
gained from a specific visualization?”. Before we can answer this question, it is important to better
understand which types of insights exists and can be offered by a visualization.
In the field of Information visualization, Yang, Li, and Zhou (2014) created a taxonomy of insights
people derive from visualizations. They presented three different simple graphics to more than 500
participants. For each graph, the participant had to express what they understand and learn from it
in free text, with a minimum of 15 words. These texts were coded and analyzed. From these analysis,
they found eight types of insights, which they organized in two groups, namely: basic insights and
comparative insights. The four types of basic insights are Read Value, Identify Extrema, Characterize
Distribution, and Describe Correlation. The four types of comparative insights are compare values,
compare extrema, compare distribution, and compare correlation. In this study, participants had to
achieve insights from data that is not related to themselves. This is different with self-tracked data,
where visualizations are created from personal data. In the quantified-self domain a taxonomy was
created by Choe, Lee, and Schraefel (2015). They analyzed 55 Q-selfers’ (members of the Quantified
Self community) presentations to analyze the insights they got from their data and the visualizations
they use to show these insights. They identified eight insight types, namely: detail, self-reflection,
trend, comparison, correlation, data summary, distribution, and outlier. In Table 1, a description is
given for each of this types with their subtypes.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 10
Table 1: Description of the 8 different Insight Types as given by Choe et al. (2011)
Types Description
Detail 1) Explicitly state the identities of the data points possessing extreme values of the
measure variable
2) Explicitly specify the measured value, its range for one or more clearly identified data
points.
3) Explicitly state the values of categorical variables, labels from the axes, or legends
Self-
Reflection
/Recall
1) Uncaptured data provided by the presenter to understand and explain a
phenomenon shown in the data
2) Collected data contradicts existing knowledge
3) Predict the future based on the collected data
4) Collected data confirms existing knowledge
Trend Describe changes over time
Comparison 1) Compare measured values by a factor (other than time)
2) Compare measured values segmented by time
3) Bringing in external data for comparison
4) Compare two specific instances
Correlation Specify the direct relationship between two variables (but not as comparison)
Data
summary
Summary of collected data (such as number of data points and duration of tracking)
Distribution 1) Explicitly state the variability of measured values
2)Explicitly describe the variation of measured values across all or most of the values of
a categorical variable
Outlier Explicitly point out outliers or state the effect of outliers
In a recent study Choe et al. (2017) built a web-based application (Visualized Self) based on the eight
insight types and conducted think-aloud study to examine how people reflect on their personal data
and what types of insights they gain. In this study the participants were not only members of the
quantified-self community, but also other self-trackers. Based on interviews with the participants,
they conclude that their web-application helped participants identify most of 8 personal insight
types reported.
1.6 Data Visualizations
Despite of the fact that the way self-tracking applications provide the data to the user is often not in
line with the user’s expectations (Epstein et al., 2016), developers already came up with multiple
ways to make the data available for the user. All attempting to give the user the opportunity to gain
some insights. The approaches current applications use to show the collected data can be divided in
three types of methods, namely: abstract ways to present data, data visualizations, or providing raw
data.
Protagonist of the abstract ways of presenting self-tracked data argues that this data is mostly
abstract in nature (Moere, 2008). Measured data like heart rate, skin conductance, etc., do not have
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 11
a one-to-one relationship with objects in the real world. Therefore, they belief that more natural
ways of presenting data can also be successful in conveying information (Consolvo, McDonald, &
Landay, 2009). One example of such an abstract way of presenting the data to the user is by art. Fan,
Forlizzi, and Dey (2012) developed a web application, called Spark, that visualizes physical activity
data from Fitbit trackers in abstract art. Users could choose among five visualizations when entering
Spark, four abstract ways and one graph. After three weeks, they interviewed several families who
used the application. Their results suggest that the abstract visualizations are appealing and
aesthetic, but they also found that most participants chose for the graph visualizations when they
looked for specific information, daily trends, or activity over time. Another example of visualizing
data in an abstract way, is the use of physical visualizations. Stusak, Tabard, Sauka, Khot, and Butz
(2014) presented physical running data with data sculptures. From every run, they created a unique
physical piece, which is part of a larger sculpture. After a three-week field study, they performed
interviews to evaluate their concept. The results of these interviews showed that participants had
difficulties to understand how their run was visualized. More recent studies also investigated the use
of palatable representations and chocolate printed messages to visualize physical activity data (Khot,
Lee, Aggarwal, Hjorth, & Mueller, 2015; Khot, Pennings, & Mueller, 2015). The results of these
studies all show that the abstract ways of visualizing data have some advantages in attractiveness
and aesthetics. However, some of them also show that users have difficulties to get insights from
these visualizations. As we described before, the main purpose of personal visualizations is giving
the users insights into their data.
The more traditional way of presenting data with graphical representations (graphs, scatterplots,
timeline-visualizations, etc.) can be more effective in giving the user the opportunity to get insights
from their data. We will give different examples of studies showing that general users can get
insights from graphical presentations. The first example is MobiTop, developed by Park, Oliver, and
Pedro (2015). It is a self-reflection tool for understanding browsing behavior. The tool shows an
overview of the user’s mobile browsing app with different graphs. A first pilot study with the tool
showed that users could get insights from these graphs.
Another example is Lullaby, a capture and access system for understanding the sleep environment
(Kay et al., 2012). Lullaby combines temperature, light, motion, audio, and sleep data and show this
data to the user with graphical representations, like timeline visualizations. To evaluate the system,
Lullaby was installed in four participants their homes. After a two-week study, the participants were
interviewed to explore their experiences with the system. The results of the interviews show that
the participants could get insights from the graphs.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 12
A recent study on preventive blood self-monitoring from Verdezoto and Grönvall (2016) also
investigate the usefulness of graphical representations. They create different line charts (daily and
weekly) visualizations from participants blood pressure data. The visualizations were shown in a
workshop, where participants, mainly elderly, had to evaluate them. Most of them found the charts
useful and got insights from them.
In another study, Cuttone, Lehmann, and Larsen (2013) showed a mobile personal informatics
system with visualizations in the form of spiral timelines and bubbles, which are more recent types
of visualization techniques. In their experiment, they collected location data from 136 participants
for four months. A participant survey afterwards showed that more than half of the participants
reported preference for getting insights with these visualizations.
All four of these examples show that users could get insights from graphical and scientific
visualizations, even when these users were general users without any knowledge of graphs, like the
elderly in the study from Verdezoto and Grönvall. These results are in line with a study from Smuc et
al. (2008). They investigated the generation of insights and understanding by means of visualizations
without any training. A case study was conducted to find out if some knowledge about a
visualization is necessary. In the study, participants, both experts in visualizations and non-experts,
were asked to give the insights they got from two types of visualizations (cycleplots and multiscale
visualizations) using the think-aloud method. When they were not able to generate new insights,
they got a short explanation of the visualizations. Next, they were asked again to give the insights
they get by using the think-aloud method. Afterwards the results of the expert in data visualizations
were compared with the results of non-experts. Little differences were found between them, both
could generate insights into the data in similar time and quality.
All these examples show that the use of graphical representations can be effective to give the user
the opportunity to get insights from their data. However, users often stop tracking because they are
not satisfied with the presented information and the visualization do not always show what the user
expected, even when their data is presented with graphical representations. This allows us to
conclude that it is important to know which insights can be obtained from a graphical
representation. Or in other words, graphical representations need to be evaluated on the insights
the general user can achieve from them.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 13
1.7 Evaluating Personal Visualizations
Based on the different insight types found by Choe et al. (2015) and the different studies showing
the capability of the general self-tracker to get insights from graphical visualizations, we believe that
the number of people who stop tracking can be reduced by offering them the right visualizations.
These visualizations need to provide the self-tracker the data-based insights that he expected to gain
from the visualizations. But how can we measure if the provided visualization gives the user the
opportunity to get those insights from the data? Or in other words: in which way are visualizations
evaluated on the insights people can get from them?
Despite of different attempts made in the past to create solid theories and models, there is still a lot
of research needed in the field of visualization evaluations (Keim et al., 2010). One of the main
reasons that proper evaluation of visualizations is hard is that humans play a central role in these
visualizations. In chapter 8 of the book: ‘solving problems with visual analtyics’, Keim et al. (2010)
describe the state of the art of evaluation methodologies in visual analytics. Methodologies and
metrics are needed to objectively measure usability, learnability, quantification of insights and
technology benefits. North (2006) focuses on the question: how to measure insight? He argues that
the purpose of visualization evaluation is to determine to what degree visualizations achieve
insights. In his Visualization Viewpoints paper, he proposes two new evaluation methods that
measure insight more directly. In the first method, he suggests including more complex benchmark
tasks in the existing controlled experiments. These tasks can ask for correlations, distributions or
other types of insight types. Where in previous controlled experiments with benchmark tasks, the
correctness measure is Boolean (right or wrong), he argues to work with multiple choice questions
for these tasks. North argues that these types of tasks generally support visualization overviews. The
second method North proposes is to eliminate benchmarks tasks and using a qualitative insight
analysis in the form of a think-aloud study. Afterwards the qualitative data must be convert into
quantitative data by coding. After converting, researchers can count the number of insights and the
time they needed. North realizes that these proposals are a good first step to new methods that
determine whether visualizations are achieving their grand purpose, namely insight. This paper of
North can be considered as one of the leading papers towards insight-based visualizations.
The first example of an insight-based methodology for evaluating visualizations based on the
guidelines of North, was in the area of bioinformatics. Saraiya, North, and Duca (2005) evaluate five
different visualization tools for bioinformatics. An open-ended think-aloud study was done where
participants analyze the data. Saraiya et al. conclude that the methodology succeeded in providing a
good analysis of the insight capabilities of the visualizations, but that there are still some difficulties.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 14
The main difficulties are that the method is labor intensiveness and users should be motivated for
the think-aloud study. They propose that further work is needed to overcome these main difficulties,
such that the effectiveness of visualizations in many domains can be measured more easily.
In a more recent study from North, Saraiya, and Duca (2011) a comparison is made between a
benchmark tasks evaluation method and the insight-based evaluation method for information
visualization. One of the main conclusions of this study was that the insight method refutes the task
method, which can suggest that users behave differently when not in the forced direction of a task-
based method. Also, in this study the insight method was surprisingly more powerful than the task
method.
Despite the insight-based methodologies proposed by North and colleagues can be seen as one of
the leading methods in evaluation of visualizations, recently other evaluation methods are shown to
be useful as well. One of such a method is the Elicitaition Interview Technique. Hogan, Hinrichts, &
Hornecker (2015) discuss the potential of this interview technique in the context of visualization.
They claim that the Elicitation interview technique is a valuable addition to current evaluation
techniques. It can help to evaluate how people experience visualizations and can be applied when
studying how visualization supports data analysis processes.
The papers, described above, are all theoretical proposal on how to evaluate visualizations, but
there are also studies, in different domains, that evaluate visualizations in real world settings.
One of these studies was done in the domain of the Learning Analytics. Govaerts, Duval, Verbert, &
Pardo (2012) developed the Student Activity Meter that visualizes learner actions. They had different
design iterations and used both quantitative and qualitative evaluation studies to access the
usability, use, and usefulness of the visualizations. For the first iteration computer science students
evaluated the visualizations in a task-based interview. The students’ satisfaction was also measured
with the System Usability Scale and Microsoft Desirability Toolkit. For the next iteration teachers
evaluated the visualizations by filling in a survey. This survey contained multiple choice and open
questions. For the last iteration computer science teachers evaluated the visualizations in a face-to-
face interview with fixed questionnaire of 25 open questions and tasks. Govaerts et al. (2012)
conclude that the methodology they used can be useful for other evaluations of visualizations. They
believe that evaluating iteratively is very valuable and hope that this work will inspire others to do
more user studies on visualizations.
Another example of evaluation of visualizations from real-life data came from Zhao, Ng, & Cosley
(2012). They developed pieTime, a visualization of people’s behavior across communication media.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 15
They evaluate pieTime on how well it supports reflection on patterns of activity. The evaluation was
as follow: the participants first got some time to play around with the tool while thinking-aloud; next
an interview was done, where some open-questions were asked. One of the limitations Zhao et al.
remarked, was that they used a short-term lab-based deployment to evaluate the tool and that such
evaluations might overstate their value. Sharmin & Bailey (2013) developed ReflectionSpace, an
interactive visualization tool for supporting activities associated with reflection in action. It is a tool
that map design materials to the design phase and context of use. To evaluate the tool, Sharmin &
Bailey followed a cued recall paradigm where participants had to recall things about a project with
and without the tool. Afterwards the participants were interviewed about their overall perceptions
of the tool.
The tree examples described above are not related to the quantified-self domain, this is because
Personal visualizations or visualizations of personal informatics data are relatively new. Therefore,
the number of studies done towards effectiveness of these evaluations is limited. Choe et al. (2015)
identified their 8 types of insights by analyzing public data in the form of Q-selfers’ presentations,
and to test these types of insights, Choe et al. (2017) built an application and let evaluate the users
the application by doing an insight-based think-aloud study. The use of different qualitative
methods from other disciplines in the quantified-self domain was proposed by Thudt et al. (2017).
They provide four empirical methods (Elicitation Interviews, Probing, Diary studies, and Analysis of
public data) which can inform or evaluate the design of personal visualizations. All four of these are
qualitative and time-consuming methods. Furthermore, it is hard to compare results of different
studies which used these evaluation methods.
Despite of the work by North (2006) and other researchers who came up with an insight-based
methodology to evaluate visualizations, we see that often visualizations in the personal informatics
domain are evaluated with complex and intensive methodologies. Visualizations are often evaluated
by experts or by doing a qualitative method where the visualizations are shown to a limited number
of participants and the questions or tasks are generated by the researchers without any reference to
previous studies.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 16
1.8 Need for a New Evaluation Methodology
We showed before that it is very important in the field of personal informatics to know if the user
can gain insights from the visualizations. Therefore, developers and researchers must be able to
evaluate the visualizations they created, before distributing them to the self-trackers. We believe
that in this way, the number of self-trackers that stop tracking their data will be diminished. Till now
evaluation of visualization is mostly done with qualitative methods. Considered the rapid growth of
the quantified self, there clearly is a need for a more efficient and easy methodology to evaluate
personal informatics visualizations. Such an easy and time-efficient data-collection method is a
questionnaire. Next to its efficiency and its usability, a questionnaire supports objectively and allows
easy comparison between results. Therefore, the main aim of this master thesis is to develop a
quantitative method to evaluate personal-informatics visualizations on the insight people can gain
from them. A second aim of this thesis is to find out to which extend the insights found with the
developed questionnaire can be compared with insights found with a qualitative method.
These two aims will be studied in two different studies. In a first phase, the questionnaire is created
based on current literature and a selection of the best question-items. In a second phase, the
created questionnaire is compared with the insight-based methodology, which is given earlier in this
section.
In the remainder of this master thesis, these different phases to develop and compare the
questionnaire for evaluation of personal visualization are shown. For each of these phases, the study
performed during the phase is described and the results of these studies are given and discussed.
After describing each phase separately, a general discussion is made in which an answer on the main
aim and other goals of thesis is given. Next, limitations of the conducted studies are given together
with some suggestions for future research. Finally, an overall conclusion is made.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 17
2 Study I: Creating the Questionnaire
2.1 Introduction
The developed quantitative method is a questionnaire with which researchers and developers can
evaluate their visualizations on insights people can gain from them. As shown in the introduction,
previous studies found that there are eight different insight types people can get from a personal
visualization (Choe et al., 2015). Therefore, the questionnaire should be able to measure for each of
these insight type if a person is could achieve it from a visualization.
Qualitative methods showed that statements given by a participant can be categorized to a specific
insight. We argued that similar statements related to an insight type could be used to develop the
questionnaire. In this first study, we investigated which of these similar statements should be
included in the questionnaire.
2.2 Methods
2.2.1 Design & Participants
A survey was posted online for three weeks to determine which statements per insight type should
be included in the final questionnaire.
In this survey, participants had to classify a set of statements towards one of the eight insight types
as found by Choe et al. (2015) (table 1). Participants had to classify each of these statements to one
of the eight insight types to see if the statements indeed related towards the insight types they were
created for. We argued that there was only one right answer for each statement, namely the insight
type for which the statement was created. As one of our main goals was to find an efficient way to
evaluate visualizations, we decided to include only a small number of statements for each insight
type. We looked for the statements with the highest classification and expected to find at least four
statements for each insight type that have a high classification rate.
More than 400 people opened the online survey, but most of them stopped after reading the
informed consent or reading the descriptions of the insight types. A total of 63 participants started
classifying statements. Only 51 of them classified all 55 statements towards one of the insight types.
The other 12 participants filled in a part of the survey. As some of the insight types and their
descriptions ask for some basic statistical knowledge, the online survey was only shared with
university students. All students following the course ‘Research Methods’ at Eindhoven University of
Technology were send an invitation. An invitation for the online survey was also send to fellow
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 18
students in the Master Human-Technology Interaction and university students in other master
degrees.
2.2.2 Statements
To create the initial pool of statements, we collected all quotes categorized to one of the insight
types that were given in previous studies. These previous studies existed of studies reviewed by
Kersten-van Dijk, Westerink, Beute, and IJsselsteijn (2017) in their critical review on personal
informatics, self-insights and behavior change together with other recent studies (Choe, Lee, Lee,
Pratt, & Kientz, 2014; Choe, Lee, Kay, Pratt, & Kientz, 2015; Choe, Lee, Zhu, Riche, & Bauer, 2017;
Collins, Cox, Bird, & Cornish-Tresstail, 2014; Cuttone, Lehmann, & Larsen, 2013; Epstein, Cordeiro,
Bales, Fogarty, & Munson, 2014; Kay et al. 2012; Li, Dey, & Forlizzi, 2011; Rooksby, Rost, Morrison,
& Chalmers, 2014) .
In these studies, insights were coded from quotes found during interviews, focus groups or think
aloud studies. These quotes were listed for each insight type. Based on these lists and the
descriptions of each insight type given by Choe et al. (2015) 6 to 8 statements were created per
insight types. At the end 55 statements were created (Appendix A).
2.2.3 Procedure
When participants opened the link to the online survey, they first got an informed consent
(Appendix B). They had to read and accept this informed consent before they could start filling in the
survey. Next the description of each insight type (table 1) was shown to the participants. They were
asked to clearly read these descriptions, before entering the next page. The next seven pages each
showed eight of the created statements in a random order. Participants had to classify each of these
statements to one of the eight insight types. During the survey, the participants could always consult
these descriptions. After classifying all 55 statements, the goal of the study was explained.
2.2.4 Data Analysis
For each statement, the classification rate that this statement was classified to the insight type for
which is was created is calculated. Furthermore, we also calculated the proportions that the
statement was classified to the other insight types.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 19
2.3 Results
2.3.1 Outliers and Partly filled in Surveys
One participant, who filled in the whole survey, only categorize 16% (9/55) of the statements
correctly. We argue that this participant did not fully understand all 8 insight types and decided to
drop this observation.
As described in the method section, 12 participants only partly filled in the survey. From these 12
participants, 5 participants stopped after the first page of statements, 4 after the second page, one
after the third page, and 2 after the fourth page. We had a better look to these 12 participants,
trying to understand the reason why they stopped filling in the survey and drop the results of them
when needed.
Participants with ID’s 25, 30, 79, 116 and 151 all stopped after filling in one page of statements,
which means they classified 7 or 8 statements. All of them classified at least three of these
statements to another insight type than the type for which the statement was created. We argue
that 7 out of 52 statements classified is too low to consider these data point as reliable and dropped
them. ID 71, 89, 189 and 191 filled in two pages of statements, except from ID 89, these participants
classified most of the 15 or 16 statements right. The person with ID 89 classified only 6 of the 16
statements right. Based on this number and the fact that this person only filled in 16 of the 55
statements, we decided to drop this observation. We calculated for the remaining 56 participants
the number of statements they qualified correctly on the total number of statements they qualified.
An average classification rate of 54.76% (SD=10.66) is found. If we only considered participants who
classified all statements, a classification rate of 55.05% (SD=10.09) is found, which is a similar result.
Therefore, we decided to include these partly filled in surveys.
The analysis to find the best statements for each insight type is done with the remaining 57
observations.
2.2.3 Classification per Insight Type
As the results of total classification rate already illustrated, not all statements were classified
towards the insight type for which these statements were created. As one of our main goals was to
find an efficient way to evaluate visualizations, we decided to include four statements for each
insight type in the final version of the scale. Table 2 gives for each insight types the four statements
that had the highest classification rate for the insight for which they were created. The names of the
statements indicate for which insight type the statement was created. For each of these statements
the relative frequencies are given for the number of times the statements were classified into the
eight different insight types.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 20
Table 2: Four ‘best’ classified statements per insight type with its relative frequencies of correct classifications
Statements Det
ail
Co
mp
aris
on
Co
rrel
atio
n
Dat
a
Sum
mar
y
Tren
d
Self
-re
flec
tio
n
Ou
tlie
r
Dis
trib
uti
on
Detail 3 73.08% 0% 0% 15.38% 1.92% 3.85% 1.92% 3.85%
Detail 5 75.93% 3.70% 0% 7.41% 3.70% 1.85% 3.70% 3.70%
Detail 6 57.41% 9.26% 3.70% 5.56% 0% 20.37% 1.85% 1.85%
Detail 8 62.96% 1.85% 0% 14.81% 0% 7.41% 1.85% 11.11%
Comparison 1 0% 62.96% 33.33% 0% 1.85% 1.85% 0% 0%
Comparison 3 5.66% 86.79% 1.89% 0% 3.77% 0% 0% 1.89%
Comparison 4 5.66% 90.57% 1.89% 0% 0% 1.89% 0% 0%
Comparison 7 3.85% 86.54% 0% 1.92% 5.77% 1.92% 0% 0%
Correlation 1 1.85% 11.11% 77.78% 1.85% 1.85% 1.85% 3.70% 0%
Correlation 3 1.92% 19.23% 75.00% 0% 3.85% 0% 0% 0%
Correlation 5 1.85% 11.11% 85.19% 0% 0% 0% 0% 0%
Correlation 6 1.92% 9.62% 78.85% 5.77% 1.92% 0% 0% 1.92%
Data Summary 1 12.96% 0% 0% 74.07% 1.85% 0% 5.56% 5.56%
Data Summary 4 24.00% 0% 0% 70.00% 0% 0% 2.00% 4.00%
Data Summary 5 5.56% 5.56% 0% 72.22% 7.41% 0% 1.85% 7.41%
Data Summary 6 3.70% 0% 0% 85.19% 0% 7.41% 3.70% 0%
Trend 2 2.00% 2.00% 10.00% 4.00% 74.00% 4.00% 2.00% 2.00%
Trend 3 2.00% 4.00% 8.00% 6.00% 66.00% 4.00% 0% 10.00%
Trend 5 0% 3.85% 0% 1.92% 88.46% 1.92% 0% 3.85%
Trend 6 1.92% 13.46% 0% 9.62% 69.23% 0% 0% 5.77%
Self-reflection 4 0% 3.85% 0% 3.85% 3.85% 84.62% 0% 3.85%
Self-reflection 5 4.00% 8.00% 4.00% 0% 2.00% 70.00% 10.00% 2.00%
Self-reflection 7 1.92% 5.77% 0% 1.92% 9.62% 76.92% 3.85% 0%
Self-reflection 8 1.92% 1.92% 3.85% 0% 1.92% 80.77% 7.96% 1.92%
Outlier 1 3.70% 11.11% 3.70% 0% 5.56% 1.85% 62.96% 11.11%
Outlier 3 26.00% 10.00% 10.00% 8.00% 0% 12.00% 28.00% 6.00%
Outlier 5 9.62% 19.23% 0% 3.85% 1.92% 13.46% 44.23% 7.69%
Outlier 6 9.62% 34.62% 0% 1.92% 0% 0% 46.15% 7.69%
Distribution 2 0% 5.66% 1.89% 11.32% 30.19% 7.55% 1.89% 41.51%
Distribution 4 31.48% 1.85% 1.85% 16.67% 12.96% 14.81% 0% 20.37%
Distribution 5 24.07% 7.41% 3.70% 18.52% 7.41% 16.67% 0% 22.22%
Distribution 6 7.69% 7.69% 3.85% 7.69% 40.38% 1.92% 3.85% 26.92%
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 21
2.4 Discussion
The average classification rate found showed that participants classified slightly more than half of
the statements to the insight type this statement was created for. This showed that despite of the
fact that the statements were based on quotes which were categorized to these insight types in
previous studies, it was still very difficult to classify them. No clear reason, based on the data, can be
given for these results.
Many people opened the survey, but stopped after reading the description of the insight types. This
can indicate that the descriptions of the insight types are hard to understand. We were aware that
some basic statistical knowledge was needed to understand these descriptions.
Despite of the fact that the average classification rate was rather low, we found for six of the eight
insight types at least four statements with a high classification rate. As shown in table 2, these
insight types were: Detail, Comparison, Correlation, Data Summary, Trend, and Self-reflection.
For most of the statements in the insight types, a classification rate higher than 70% was found.
Compared to the average classification rate, these rates are high.
For the statements with a classification rate lower than 70%, we found a big difference with the
other insight types, which means that these statements were harder to classify to the type they
were created for, but cannot be classified as one of the other insight types. Therefore, we decided to
include these statements in the final questionnaire.
The results in table 2 also show that we could not find four statements that clearly can be classified
to Outlier. A large proportion of participants classified the statements created for insight type
Outlier to insight type Detail or Comparison. An explanation can be found in the descriptions given
by Choe et al. (2015) and thus also at the beginning of the study (table 1). Insight type outlier is
described as “Explicitly point out outliers or state the effect of outliers”. To decide if an observation
is an outlier and thus explicitly point out an outlier, you have to compare this observation with other
observations in the visualization. This can explain the high proportion of participants that classified
these statements as Comparison.
Similarly, the description given to the participants for Detail was as followed: “Explicitly state the
identities of the data points possessing extreme values”. Often, an observation is considered as an
outlier, when the measured variable is extremely different from other observations. In this way, we
can consider insight type outlier as a subtype of insight type Detail.
Based on the high proportion found for types Detail and Comparison and the clear overlap in the
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 22
descriptions of outliers and these types, we argued that insight type Outlier is not mutual exclusive
from these two other insight types. Therefore, we decided to not include statements for Outlier in
our final questionnaire.
A second insight type, for which we could not find four statements with a high classification rate, is
Distribution. The results in table 2 shows that statements created for Distribution are often classified
as Detail or Trend. Similar to Outlier, these high classification rates for these types can be explained
with the descriptions given to the participants (table 1). The second point in the description of
Insight type Detail was: “Explicitly specify the measured value, its range for one or more clearly
identified data points”. To specify a range for more data points, you have to take a better look to
the minimum and maximum value of these data points. Overlap can be found with the first point in
the description given for Distribution, which was as followed: “Explicitly state the variability of
measured values”. To state a variability of measured values, you also have to look the minimum and
maximum values found. This explains why a large proportion of the participants classified these
statements to Detail instead of Distribution.
Furthermore, to state a variability of a variable, you have to consider all measured values. Self-
tracked data is often measured in time, which makes that the variability of a variable will often be
stated with respect to time. However, the descriptions of changes over time is categorized as insight
type Trend. We conclude that ‘Describing the changes over time’ and ‘State the variability of
measured values’ cannot always be considered as two different types of analysis. This explains why
many participants classified statements created for insight type Distribution as Trend.
Based on the high proportion found for types Detail and Trend and the clear overlap in the
descriptions of Distribution and these types, we decided to not include statements for Distribution in
our final questionnaire.
The final questionnaire is created based on the discussion before and shown in Appendix C. The
questionnaire exists of 24 statements, four for each of the six insight types that were included.
For these statements participants have to indicate on a 7-point scale ranging from 1- ‘strongly
disagree’ to 7 – ‘strongly agree’ how much they agree with the statements. A 7-point scale was
chosen such that the variety of answers participants can give is large enough and measurements of
insight types are not dichotomous. Based on what the participants indicate on these statements, a
score for each insight type can be calculated, such that a visualization can be evaluated on the
insight types people can get from them.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 23
2.5 Conclusion
The main goal of this study was to create a questionnaire with which researchers and developers can
evaluate data-visualizations on the insights people can get from them. Based on the results of this
study we reduced the number of insight types from eight to six. For each of these six insight types,
four statements were found.
We believe that the median score people give on these four statements will indicate how easily
participants can gain the corresponding insight type from a personal informatics visualization.
To check the validity of the questionnaire a second study needed to be done.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 24
3 Study II: Comparison of the Questionnaire with the
Insight-Based Methodology
3.1 Introduction
After developing the questionnaire, a study is performed to test the questionnaire on its concurrent
validity. Therefore, we compared the questionnaire with one of the complex and time-consuming
qualitative methods described in the general introduction. As shown in this introduction, different
qualitative methods are used in the past to evaluate visualizations. The best-known method to
evaluate personal visualizations is the insight-based methodology as proposed by North (2006).
Besides, this paper of North can be considered as one of the leading paper towards measuring
insights, the insight-based methodology was also used by Choe, Lee, Zhu, Riche, and Bauer (2017) to
evaluate their visualization on the insight people could get from it. The insight-based methodology
exists of an open-ended think-aloud study, where the participants have to discover visualizations.
In this study, multiple visualizations will be evaluated with both the developed questionnaire and the
insight-based methodology as it was conducted by Choe et al. (2017). In this way the results of the
developed questionnaire can be compared with the qualitative method, both on visualization- and
on insight level.
3.2 Methods
3.2.2 Participants
Fifteen participants took part in the experiment. They were students at Eindhoven University of
Technology and registered in the database of the Human-Technology Interaction group. The
participants included five men and ten women. They were between 20 to 25 years old (mean = 22,
SD = 1.5). The experiment took 40 minutes and a compensation of €7 for participating was provided.
3.2.3 Design & Materials
A within subject design was used to compare the developed questionnaire with the insight-based
methodology. The insight types a participant could achieve from a visualization was measured with
both methods. The 24-item scale developed in study 1 was used, which gives for each of the six
insight types a score on a 7-point scale how easy a participant could gain an insight type from a
visualization. The insight-based methodology measured for the same six insight types if they could
be achieved or not. As described in the introduction, the insight-based methodology exists of a
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 25
think-aloud interview, were people had to discover data-visualizations and in the meantime had to
say at loud what they were thinking or what they were doing.
Participants first conducted the think-aloud interviews before filling in the questionnaires. In this
way, insights they achieved during the think-aloud interview were not triggered by statements they
read in the questionnaires.
Visualizations
In total six different visualizations were created, one test visualization and five visualizations used in
both methods. The test visualization was used to make the participant familiar with the think-aloud
protocol. We used the Plotly tool, in Python, to create the interactive visualizations. These
visualizations were implemented in a framework such that the visualizations could been shown via a
web application.
The visualizations were based on visualizations used by existing mobile fitness applications. As
described in the introduction, people can keep track of their sleep pattern, daily exercise, sport
activities, energy use, etc. While some of the applications measures multiple features, for example
Fitbit app, Moves app, Google Fit app. Others are more specialized in collecting and visualizing one
type of measurement, for example Runtastic, Pillow Sleep, or Nike+ app. The researcher of this
master thesis downloaded and tested multiple of these free fitness applications to get more
acquainted with the way existing applications visualize their data.
In this master thesis we will not only focus on one type of data that can be collected, but try to
develop a methodology that can evaluate visualizations from different types of self-tracked data.
Therefore, the six visualizations were all based on different types of data that can be collected. Each
visualization existed of two graphs, which were created from the same data or from related types of
measurements. The graphs in three of the six visualizations were based on one-month Fitbit data
collected by one of the supervisors. The graphs in the other three visualizations were created from
simulated data for the same month. Beside the different types of data, we also decided to use
different types of graphs that are used in existed mobile fitness applications.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 26
The test visualization existed of a bar chart and a pie chart, which presented the daily number of
steps and distance walked of a person for one month (Figure 2). The bar chart showed these number
of steps and distance walked for every day separately. The line chart gave the sum of these variables
until a specific day. This specific day could be chosen by the participants
Figure 2. Test Visualization – Bar charts presenting a person’s daily number of steps and distance for one month, the line chart shows the total number of steps and distance (km) a person took until a specific day in
the month.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 27
The first visualization, which was used for both methodologies, contained a stacked bar chart and
pie chart to show the duration of different types of activity of a person throughout a day (Figure 3).
The stacked bar chart showed these activity in minutes for every day of the month. The pie chart
gave the activity for one specific day in percentiles.
Figure 3. Visualization 1 – A person’s different ways of activity throughout the day in minutes given in stacked bar charts. A pie charts show the different types of activity a person had in percentiles for one specific day.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 28
The second visualization presented the number of calories a person burned in rest and by activity
per day. Data for one month was shown with a bar chart and a scatter plot (Figure 4). The bar chart
showed for every day the calories burned by rest, by activity and in total. The scatter plot presented
the calories burned in rest vs the calories burned by activity per day.
Figure 4. Visualization 2 – showing a person’s calories burned in rest and by activity for a day for one month with a bar chart. The scatter plot shows the calories burned by activity versus the calories burned in rest for
each day.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 29
The third visualization existed of an Gantt Chart and Gauge chart which presented objective and
subjective sleep data for one month (Figure 5). The Gant Chart presented the sleep pattern of a
person for every night of the month. The Gauge chart gave a self-indicated sleep quality score for
each night.
Figure 5. Visualization 3 – A gauge chart shows a person’s sleep pattern for every night in one month. For every night the hours of awake, light sleep, REM, and deep sleep is given. The second graph show the self-
reported sleep quality for one night.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 30
In the fourth visualization, the first graph presented the body temperature data of one month with a
heat map. The second graph presented the Heart rate of this person for the same month with a line
chart with error bars (Figure 6).
Figure 6. Visualization 4 – The graph above shows a person’s body temperature of one month, measured three times a day is shown in a heatmap. The graph below shows the same person’s heart rate with a line chart with
error bars. The error bars indicate the maximum and minimum heart rate reached that day.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 31
The fifth and last visualization participants had to evaluate existed of a Bubble chart and two
boxplots presenting a person’s running and cycling activities of one month (Figure 7). The Bubble
chart presented each activity on a time-scale. The boxplots presented a summary of all running and
cycling activities.
Figure 7. Visualization 5 – The graph above shows a person’s cycling and running activities of one month shown in a bubble chart. The size of the bubble indicates the speed of the activity. The graph below gives a
summary of the running and cycling activities based on speed and distance.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 32
Questionnaires
For each of the visualizations, the questionnaire developed in the first phase of this master study
(Appendix C) was used to measure the insight types participants achieved from the visualizations. As
many of the statements in this qeustionnaire were for general data types, a modified questionnaire
was created for each of the visualizations (Appendix D).
For example for the first visualization, which presented the minutes a person was active, the
statement: “It is easy to compare two of my latest runs, based on the speed and the distance” was
tansformed to “It is easy to compare two days, based on the minutes sedentary and the minutes
active”.
3.2.4 Procedure
The experiment was conducted in the general-purpose lab of the Human-Technology Interaction
group. After arriving at the lab, the participants first read and signed the informed consent form
(Appendix E). The experiment leader shortly revised the procedure as they could read in this
informed consent to make sure the participant understood the think-aloud protocol. After explaining
the procedure, the recording was started.
Next, the participant took place behind the laptop and the test visualization was shown. Before the
participants started discovering the visualization, the following general question was asked: “What
can you learn from the data presented with this visualization and which information do you get from
this visualization.” When the experiment leader experienced that the thinking aloud goes well, he
told the participants to go to the next visualization.
Before each visualization, the same general question was asked. When the experiment leader
noticed that the participant was not thinking out loud, questions like “Why did you do this?” or
“What were you thinking?” were asked. If the participants indicated that they could not find any
new information, they could choose the next visualization to discover. After discovering all five
visualizations, the recording was stopped.
After the think-aloud interview, the participants were asked to fill in the developed questionnaire for
the five visualizations they just discovered. The questionnaires were given on paper and participants
filled them in, in the same order as they discovered the visualizations. Participants could still consult
the different visualizations when filling in these questionnaires. After filling in the questionnaires,
participants were debriefed and could ask questions about the purpose of the experiment.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 33
3.2.5 Data Analysis
All quotes participants gave during the think-aloud interviews that could be related to one of the
eight insight types were collected. A content analysis was conducted to categorize these different
quotes to the insight types found by Choe et al. (2015). These categorizations were done by
comparing the quote with the list of quotes collected during the first phase of this thesis and with
the description of each insight type (Table 1).
During the questionnaire, participants had to fill in four statements for each insight type. The
Cronbach’s Alpha for each of the six insight types were calculated to investigate the level of internal
consistency. Furthermore, medians for the results of the four statements were calculated. In this
way one 7-point scale score for each insight type was found.
The scores for the insight types measured with the questionnaires were compared with the insights
found during the think-aloud study. We expected to find similar results with both methodologies. In
this way, the developed questionnaire could be tested on concurrent validity.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 34
3.3 Results & Discussion
In this section the results of the second study will be given and discussed. First an overview and
description from the data of both the questionnaire and the think-aloud study is provided. Next, a
comparison is made between the insights found during these think-aloud interviews and the
questionnaires. Finally, the results on insight level over all five visualizations are discussed.
3.3.1 Determination of Insights
As described in the method section, all 15 participants filled in a questionnaire for five visualizations.
In these questionnaires, six different insight types are measured by 4 different statements. In the
previous study, we showed that a high classification rate was found for each of these statements
towards the insight types for which these statements were created. By letting participants
evaluating visualizations in this second study, we could investigate the level of internal consistency
of these four statements (Table 3).
A lower alpha value was found for insight type Detail. Which suggests that the four statements used
in the questionnaire are not measuring the same concept. In the discussion section of study 1, we
already argued that insight type Outlier be a subtype of Detail, this already indicated that insight
type Detail is a wider concept, containing multiple descriptions. The description given by Choe et al.
(2015) (Table 1) also indicates that the insight type Detail can be gained in multiple ways. For
example, by finding a specific value in the visualization, but also by understanding the legend and
the corresponding labels. However, we believe that if participants indicated in the questionnaire
that it is easy to achieve an insight type from the visualization for most of the statements, then they
could gain this insight type. Therefore, to calculate if the participant could achieve a specific insight
type, the median of the 4 different statements is taken, such that outliers will have a smaller effect
of the results.
Table 3: The Mean, Standard Deviation, and Cronbach’s alpha of the six insight types
Insight Type Mean Stand Deviation Cronbach’s alpha
Detail 6.107 1.457 0.510
Comparison 4.853 1.380 0.654
Correlation 3.723 1.551 0.728
Data Summary 4.653 1.518 0.821
Trend 4.900 1.724 0.823
Self-Reflection 4.747 1.321 0.845
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 35
The think-aloud interviews are decoded, as described in the methods section. A selection of quotes
from all participants and their corresponding insight type are given in Table 4. Similar to previous
studies, who used the insight-based methodology, insights are classified as gained or not, which
makes these variables dichotomous. We determined an insight type as gained by the participant,
when at least one quote belonging to this insight type was given during the interview.
Table 4: Examples of quotes and their corresponding Insight Type
Insight Type Quote Participant
Detail “It’s clear to see the outliers, for example the 11th of April” P4 (Vis. 2)
Detail “The number of calories is highest for the 24th.” P5 (Vis. 2)
Comparison “If you compare these results with the 18th of April, then you see that there is a lot more of deep sleep for this day.
P2 (Vis. 3)
Comparison “Big differences between averages for one day.” P5 (Vis. 4)
Correlation “You can see that if a shorter distance is covered, then the speed for this distance is bigger.”
P3 (Vis. 5)
Correlation
“There is no clear correlation between the objective and subjective measurements.”
P8 (Vis. 3)
Data Summary
“This person tracked data for one month.” P1 (Vis. 1)
Data Summary
“You can see for 30 days if you were really active or not active at al.”
P14 Vis 1)
Trend “You can see a pattern for the whole month.” P4 (Vis. 1)
Trend “You see that he is warmer in the morning over the month.” P12 (Vis. 4)
Self-reflection “In this way, this person can relate his experience with what really happened that night.”
P3 (Vis. 3)
Self-reflection “For running this person probably has a specific round, because you can’t see a scheme or increase.”
P9 (Vis. 5)
When coding the recordings and determining which insight types were found with both
methodologies, We experienced that both conducting the study and analyzing the collected data
took a lot more time with the insight-based methodology as with the questionnaire.
With the insight-based methodology, each recording had to be coded to find which insight types
were found by the participants. Note that in this study only one researcher coded the recordings of
all fifteen participants. Imagine how much more time it would cost to let multiple researchers code
the recordings. With the questionnaire we easily could calculate the median of the four statements
to find out which insights can be found in the visualizations. Based on this experience we can
conclude that the questionnaire is more time-efficient to use and the collected data is easier to
analyze.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 36
3.3.2 Think-Aloud Interviews vs. Questionnaire
Comparison per Visualization
Beside verifying that the questionnaire is more efficient and easy to use, we also wanted to find out
if the insights found with the questionnaire for a certain visualization are equal to the insights found
with the insight-based methodology for the same visualization. Each participant evaluated five
visualizations. We argue that the graphs and data used in the visualization can have influenced the
insights that were found with both methodologies. Therefore, we decided to discuss the results for
each visualization separately. For each visualization, the results of the think-aloud methodology are
given by bar charts and the results of the questionnaire are presented with box plots.
The first visualization showed a person’s daily activity for one month with a stacked bar chart and a
pie chart. Comparing the results of both methodologies, we can conclude that in general the same
results are found for this visualization (Figure 8).
Both methods indicate that insight type Detail and Comparison can be achieved from the
visualization, which means that it is easy to find a specific value in the visualization or that it is easy
to make a comparison between different entities. Both methods also showed that the insight type
that is hardest to gain from this visualization is correlation. This means that it is hard to find a
positive or negative correlation between different variabels in the data. No agreement, could be
found for the other three insight types. Relative high medians were found with the questionnaire,
but during the think-aloud interviews around half of the participants could gain the insight type from
the visualization.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 37
Figure 8. Results of both methods for Visualization 1
The second visualization presented the calories a person burned for each day of a month. A bar
chart and scatter plot was used to present the data. Again, with both methods the participants
indicate that it is easy to find details from the data in the visualizations. However, the results found
for the other insight types are different for both methods (Figure 9).
High median values are found for all six the insight types, which indicate that based on the results of
the questionnaire, we would conclude that participants can achieve these insight types from the
visualization.
During the think-aloud interviews, many participants did not find some of the insight types. Thus,
based on the results of the insight-based methodology, we would conclude that only insight type
Detail and Comparison can be achieved from the visualization.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 38
Figure 9. Results of both methods for Visualization 2
Similar results were found for the third visualization and fourth visualization (Figure 9 and Figure
10), namely that the results of both methods are not in line with each other. Contrary to the results
of the second visualization, more insight types were found during the think-aloud interviews, but
with the questionnaire low median scores were given for most of these insight types.
Based on these results, we can conclude that the participants could get some insight types during
the think-aloud interviews. But when we asked them if it was easy to find these insight types, they
indicated that this was not the case.
The graphs used in these visualizations can be considered as more abstract than the graphs of the
other three visualizations. This can explain why many participants indicated in the questionnaire that
it is not easy to get those insight types. With the think-aloud interviews we only measured if they
could get the insight type or not, without considering the difficulty of finding it.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 39
Figure 9. Results of both methods for Visualization 3
Figure 10. Results of both methods for Visualization 4
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 40
The fifth visualization presented a person’s sport activities of the last month with a bubble chart and
box plots. Like with the previous two visualizations, many participants found most of the insight
types during the think-aloud interviews, but with the questionnaires median of medians scores
between 4 and 5 were found for five of the six insight types. Furthermore, the variation of the
medians score differs a lot (Figure 11).
For the previous two visualizations, the differences between the two methods were explained based
on the graphs used in the visualizations. We believe that also the type of data that is presented can
explain some differences between the two methodologies. In this fifth visualization, a person’s
running and cycling activities are presented. This type of data can be more familiar for a participant
and consequently a participant can find it easier to discover the graph. This was also indicated by
some of the participants during the think-aloud interviews. Participant 4, for example, said: “When I
cycled faster, I did more kilometers, which make sense” and also participant 10 said: “That is quite
plausible, if you go cycling you can do longer pieces, but when you run you often have your optimum,
so you cannot go faster than a certain speed.”
Figure 11. Results of both methods for Visualization 5
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 41
Comparison per Insight Type
In the previous section, we discussed per visualizations which insight types were found with both
methods or only one method. The results showed that these insight types where different per
visualizations and could be dependent on the graphs or data used in the visualizations. Another goal
of this study was to find out the differences and similarities between the two methods on insight
level. An overview of the number of times a specific insight type is found and the medians of
medians for each insight type can be found in Appendix F.
Unambiguously results for both methodologies over all five visualizations were only found for one
insight type, namely Detail. Based on the description given by Choe et al. (2015) for this insight type
(table 1), we conclude that people can state the identities of data points with extreme values,
specify the measured value or its range, and find the values of categorical variables for all five
visualizations. Furthermore, the results for this insight type also indicate that people can already
gain the insight type Detail from data-visualizations used by mobile applications nowadays, because
the evaluated visualizations are all based on existing data-visualizations. The high number of times
some details were found by the participants are in line with results found by Choe et al. (2017). They
used the think-aloud method to let 11 participants evaluate one user-interface, containing of
multiple data-visualizations and count the number of times an insight type was gained. Their results
also showed that the insight type Detail could be easily found by the participants.
Insight type Comparison was found during almost all interviews, but people did not always indicate
with the questionnaire that it is easy to make a comparison of different variables of values. One
explanation that many participants found this insight type during the interviews is that a simple
comparison of two values, entities, or graphs was already coded as Comparison. Participants 2, for
example, said during the think-aloud interview of visualization 4: “A short look to the different
evenings to see if there is a difference in body temperature.” This quote is coded as Comparison and
thus with the insight-based methodology we conclude that participants can achieve insight type
Comparison from this visualization. In the questionnaire for visualization 4, participants 2 gave a
median value of 4.5 on 7 for insight type Comparison, which indicates that the participant somewhat
agree that it was easy to get this insight type from the visualization.
Based on the results found in this study, we conclude that people are almost always able to make a
comparison between different variables and values in a visualization, but this does not necessarily
mean that they find it easy to make this comparison. Knowing if a participant experienced the
comparison of different entities as easy can be considered as a useful advantage of the developed
questionnaire.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 42
No unambiguous results were found for the other four insight types for both methodologies over all
five visualizations. The results show that for one visualization more insight types were gained with
one method and for another visualization higher scores were found with the other method. No clear
reason can be given why these results were found. Again, a possible explanation can be given based
on the different ways of measurement of the two methods. During the think-aloud interviews,
participants had to explore the visualizations without any triggering towards finding an insight type.
Contrary, in the questionnaires we asked towards how easy a specific insight type can be achieved
from the visualization. It could be the case that a participant was not able to achieve an insight type
during the think-aloud interviews, but discovered that it is easy to get this insight type from the
visualization when reading the statements from the questionnaire.
Epstein et al. (2016) showed that users stop tracking, because the visualizations did not give or no
show the information the user expected to receive. This indicates that users have expectations
before starting to discover the visualizations. These expectations can be related to those insight
types that participants did not discovered during the think-aloud interviews in this study. Therefore,
we argue that it is important to consider all possible insight types when evaluating data-
visualizations and not only those that participants will come up with during discovering these
visualizations.
3.4 Conclusion
The main goal of this second study was to compare results found with the developed questionnaire
with results of a qualitative method, both on visualization and insight level. We expected to find
similar results with the questionnaire as with the think-aloud interviews used in this insight-based
methodology. Based on the results, we can conclude that this, in general, is not the case. Different
insight types were found with the questionnaire as with the think-aloud method for four of the five
visualizations. However, some different patterns were found between the results of the two
methods.
With the analysis on visualization level, we showed that differences between the two methods can
be explained based on the data that is presented or the abstraction level of the graphs used in the
visualizations. Furthermore, the analysis on insight level showed that differences between insights
can also be explained by the different ways the methods measure if an insight type can be gained
from the visualizations. In this analysis on insight level, we also gave the advantage for the use of the
questionnaire for evaluating data-visualizations.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 43
4 General Discussion
The main of aim of this master thesis was to create a methodology to evaluate personal informatics
visualizations on insights in a time-efficient and easy way. In the first study, we successfully created
this methodology in the form of a questionnaire. In the second study, we experienced that the
questionnaire is a time-efficient and easy methodology to evaluate visualizations.
A second goal was to find out to which extent the insights found with the developed questionnaire
can be compared with insights found with a qualitative method. The results of the second study
showed that there are differences between the results of both methods. Some of these differences
can be explained based on the different ways insight types are measured with the developed
questionnaire and the insight-based methodology.
The main difference is that with the developed questionnaire, each participant evaluated all six
insight types. Contrary, with the insight-based methodology participants had to discover the
different insight types without any guiding towards these insight types, which make it possible that a
participant could not gain a specific insight type, just because he did not think of this way of
discovering the visualization.
Another difference, related to this main difference, can be found in the way the methods decide if
an insight type could be gained or not. Where the think-aloud interview only consider if an insight
type could be gained or not, the questionnaire focused on how easy a participant finds it to find an
insight type.
Based on these differences between the methods and the different results we found with both
methods, we were not able to prove the concurrent validity of the questionnaire.
However, we showed in the first study that the statements used in the questionnaire are related to
the six insight types. Furthermore, we showed in the second study that with the questionnaire we
cannot only measure if participants could find an insight type, but also if they found it easy to
achieve this insight type. This is an advantage of the questionnaire over the insight-based
methodology.
Based on this advantage and the fact that the developed questionnaire is more time-efficient and
easier to use, we argue that despite of the differences found with both methods, the developed
questionnaire can potentially be used for evaluating visualizations.
Beside the clear advantages of using the questionnaire to evaluate personal informatics
visualizations, we also found some results we did not expected.
In the first study, participants had to classify 55 statements towards one of the eight insight types.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 44
An average classification rate of 54.76 (SD=10.66) was found. Which means that around half of the
statements were classified towards the insight type for which the statement was created. However,
all 55 statements were based on quotes which were coded towards these insight types in previous
studies. These results indicate that with the qualitative methods often quotes are categorized by a
researcher to an insight type, which could be coded to a different insight type by other researchers.
It is not implausible that the same bias occurred during the coding of the think-aloud interviews in
the second study. This, again, showed the complexity of the qualitative methods to evaluate
visualizations on insights.
The insight type ‘self-reflection’ was only found in 40 of the 75 think-aloud interviews (Appendix F).
This is little compared to the results found by Choe et al. (2017). They also evaluated a visualization
with the insight-based methodology and in their study ‘self-reflection’ was the insight type that was
found the second most.
To give the participants the possibility to self-reflect on the data, the questions related to self-
reflection started with the sentence: “Imagine that the presented data is yours…”. Also during the
think-aloud interview, the researcher asked the question: “Imagine that this data is yours, what can
you learn from this data?”. Nevertheless, it can still be difficult to reflect on data that is not yours,
therefore we also coded quotes where the participants reflected in name of the person as self-
reflection. Example quotes are: “there are no measurements for these two days, I think he was not
wearing his device (P10, visualization 1)” and “Probably he realized that he was really lazy and
decided to do something (P9, visualization 2)”.
This illustrates that even when asking the participants to imagine that the presented data is theirs, it
is still difficult to evaluate visualization in this way. This is in line with findings of previous studies
(Thudt, Lee, Choe, & Carpendale, 2017).
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 45
5 Limitations
Several limitations arose during the two studies. In this section we address these limitations such
that researcher can take them into account in further research.
In the first study, participants had to fill in an online survey. After sharing this survey with fellow
students, we noticed that many participants stopped after getting the descriptions of the different
insight types (table 1). This can indicate that these participants could not understand these
descriptions. Because many participants did fill in the questionnaire at that point and we did not
want to influence the results of this study, we decided to not make any modification to the survey.
A second limitation is related to the design of the visualizations. Despite that the visualizations were
based on existing visualizations, there were some design aspects that could hinder participants to
gain insights. The created data-visualizations were given on a laptop, while the existing visualizations
all came from mobile fitness applications. The interactivity between visualizations on a mobile phone
can differ from the interactivity the participants had on the laptop. P6 for example said: “I recognize
this visualization from my Fitbit, but normally you can easily zoom in.”
A third limitation of the second study is the skills of the participants. All participants were students
at Eindhoven university of technology (TU/e) and registered in the database of the Human-
Technology group. Students at a technological university have some basic knowledge in statistics.
Therefore, it is possible that they were also better in comprehending the graphs and in getting
knowledge from the data-visualizations. This can explain the high number of insights that were
gained by the participants. As described in the introduction, nowadays everyone can start tracking
their data, which means that self-trackers without any experience with graphs should be able to get
insights from the visualizations as well.
A fourth limitation is also related to the participants, participating in the second study. During the
think-aloud interview, the researcher noticed that some of the participants were wearing a smart
watch. These participants could be more experienced with the data-visualizations than the other
participants. Participants 13, for example, said during the interview: “I recognize this from my Fitbit.”
Unfortunately, we did not ask the participants during the study if they had any experience with self-
tracking. The results do not indicate that these participants had an influence on the results we
found. Nevertheless, more research need to be done to see if there is a relation between the
experience of the self-tracker and the way he evaluates personal informatics visualizations.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 46
6 Future Research
In the introduction, we explained that people stop tracking, because they are not able to get the
right insights from the visualizations. However, the results of the second study illustrated that
existing visualizations already contain most of these insight types. This indicates that it is not only
important to know which insight types people can get from the visualizations, but also which types
they want to get from it. Questions that need to be answered in future research are: “which insight
types are most important for self-trackers?” and “Are certain insight types more important for one
type of visualizations and data measurements than for another?”.
The results of the second study showed that the graphs used in a visualization and the data
presented in these visualizations influences the differences and similarities between the two
methodologies. In his master thesis, fellow student René Van Knippenberg investigated the relation
between abstraction level of graphs and the insights people can find. In his study, he also used the
insight-based methodology to measure the insight types participants found. Further research needs
to be done to investigate the same relation between graphs and insight types, but measured with
the developed questionnaire.
Some insights were found by every participant, where for other insights only half of the participants
indicated that they could get these insights. Further research can be done towards these individual
differences. Are there different groups of self-trackers? Or which factors can explain the differences
between individuals? Till now, few research is done towards different types of self-trackers. In her
master thesis, Christina Vestergaard investigated the influence of personality types on acceptance
and adoptance of self-tracking devices. This a first step towards categorizing self-trackers in different
groups.
There were not only Individual differences between the insights participants found, but also
between the insights a participant found with one method compared to the other method. One
reason, we did not mention yet, can be that some participants did not feel confident during the
think-aloud interview or had difficulties to talk at loud about everything they did and thought. In this
study, we gave the participants the chance to practice the method. However, it still possible that,
even after practicing, some participants are more fluent in it than others. Further research can be
done towards the relation between how confident people are during the think-aloud interview and
the insights people can get.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 47
7 Conclusion
In this master thesis we presented a questionnaire that can be used to evaluate personal informatics
visualizations. This questionnaire will make it easier for researchers and developers to evaluate their
data visualizations on the insights people can get from, before they are presented to the user.
We showed that based on the growing field of the quantified-self domain and the complexity of
current evaluating methods, there was a need for a new time-efficient and easy tool to evaluate
personal informatics visualizations.
This tool was created in the first study of this master thesis in the form of questionnaire. The results
of this study showed that the statements used in this questionnaire are strongly related to the
different insight types. Based on the way these statements were created, we conclude that they can
be used to evaluate visualizations.
The comparison of this questionnaire with a qualitative method, performed in the second study,
showed the similarities and differences of our tool with this method. Furthermore, the results
showed some advantages of the questionnaire. Not only the availability of an insight type can be
measured, but also how easy it is for the user to get this insight type from the visualization.
Based on the results found in these two studies, we conclude that the developed questionnaire can
potentially be used as an easy an efficient way to evaluate personal informatics visualizations.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 48
References
About the Quantified Self. (2016, November 01). Retrieved April 01, 2018, from
http://quantifiedself.com/about/
Choe, E. K., Lee, N. B., Lee, B., Pratt, W., & Kientz, J. A. (2014). Understanding quantified-
selfers' practices in collecting and exploring personal data. In Proceedings of the 32nd annual ACM
conference on Human factors in computing systems, 1143-1152.
Choe, E. K., Lee, B., Zhu, H., Riche, N. H., & Baur, D. (2017). Understanding Self-Reflection:
How People Reflect on Personal Data through Visual Data Exploration. In Proceedings of the 11th EAI
International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth’17).
ACM, New York, NY, USA, 10.
Card, S. K., Mackinlay, J. D., & Shneiderman, B. (Eds.). (1999). Readings in information
visualization: using vision to think. Morgan Kaufmann.
Chen, C. (2010). Information visualization. Wiley Interdisciplinary Reviews: Computational
Statistics, 2(4), 387-403.
Choe, E. K., Lee, B., Schraefel, M. C. (2015). Characterizing visualization insights from
quantified selfers' personal data presentations. IEEE computer graphics and applications, 35(4), 28-
37.
Collins, E. I., Cox, A. L., Bird, J., & Cornish-Tresstail, C. (2014, December). Barriers to
engagement with a personal informatics productivity tool. In Proceedings of the 26th Australian
Computer-Human Interaction Conference on Designing Futures: the Future of Design (pp. 370-379).
ACM.
Consolvo, S., McDonald, D. W., & Landay, J. A. (2009, April). Theory-driven design strategies
for technologies that support behavior change in everyday life. In Proceedings of the SIGCHI
conference on human factors in computing systems (pp. 405-414). ACM.
Cuttone, A., Lehmann, S., & Larsen, J. E. (2013). A mobile personal informatics system with
interactive visualizations of mobility and social interactions. In Proceedings of the 1st ACM
international workshop on Personal data meets distributed multimedia (pp. 27-30). ACM.
Ellis, G., & Mansmann, F. (2010). Mastering the information age solving problems with visual
analytics. In Eurographics (Vol. 2).
Epstein, D. A., Caraway, M., Johnston, C., Ping, A., Fogarty, J., & Munson, S. A. (2016).
Beyond Abandonment to Next Steps: Understanding and Designing for Life after Personal
Informatics Tool Use. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
. CHI Conference, 2016, 1109–1113. http://doi.org/10.1145/2858036.2858045
Epstein, D. A., Ping, A., Fogarty, J. and Munson, S. (2015). A lived informatics model of
personal informatics. Proceedings of the 2015 ACM International Joint Conference on Pervasive and
Ubiquitous Computing - UbiComp '15, 731-741.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 49
Etkin, J. (2016). The Hidden Cost of Personal Quantification. Journal of Consumer Research,
Volume 42(6), 967–984. https://doi.org/10.1093/jcr/ucv095
Fan, C., Forlizzi, J., & Dey, A. K. (2012, September). A spark of activity: exploring informative
art as visualization for physical activity. In Proceedings of the 2012 ACM Conference on Ubiquitous
Computing (pp. 81-84). ACM.
Few, S., & Edge, P. (2008). What ordinary people need most from information visualization
today. Perceptual Edge: Visual Business Intelligence Newsletter, 1-7.
Flood, J. (2013). Wearable Computing Devices, ABI Research. Retrieved from
https://www.abiresearch.com/press/wearable-computing-devices-like-apples-iwatch-will/
Forsell, C., & Cooper, M. (2012, October). Questionnaires for evaluation in information
visualization. In Proceedings of the 2012 BELIV Workshop: Beyond Time and Errors-Novel Evaluation
Methods for Visualization (p. 16). ACM.
Govaerts, S., Verbert, K., Duval, E., & Pardo, A. (2012, May). The student activity meter for
awareness and self-reflection. In CHI'12 Extended Abstracts on Human Factors in Computing
Systems (pp. 869-884). ACM.
Hogan, T., Hinrichs, U., & Hornecker, E. (2016). The Elicitation Interview Technique:
Capturing People's Experiences of Data Representations. IEEE transactions on visualization and
computer graphics, 22(12), 2579-2593.
Kay, M., Choe, E. K., Shepherd, J., Greenstein, B., Watson, N., Consolvo, S., & Kientz, J. A.
(2012, September). Lullaby: a capture & access system for understanding the sleep environment.
In Proceedings of the 2012 ACM conference on ubiquitous computing (pp. 226-234). ACM.
Keim, D. A. (2002). Information visualization and visual data mining. IEEE transactions on
Visualization and Computer Graphics, 8(1), 1-8.
Keim, E. D., Kohlhammer, J., & Ellis, G. (2010). Mastering the Information Age: Solving
Problems with Visual Analytics, Eurographics Association.
Kerren, A., Plaisant, C., & Stasko, J. T. (2011). Information visualization: State of the field and
new research directions. Information Visualization, 10(4), 269.
Kerren, A., Stasko, J., Fekete, J. D., & North, C. (Eds.). (2008). Information Visualization:
Human-Centered Issues and Perspectives (Vol. 4950). Springer.
Kersten-van Dijk, E. T., Westerink, J. H., Beute, F., & IJsselsteijn, W. A. (2017). Personal
informatics, self-insight, and behavior change: A critical review of current literature. Human–
Computer Interaction, 32(5-6), 268-296.
Khot, R. A., Lee, J., Aggarwal, D., Hjorth, L., & Mueller, F. F. (2015). Tastybeats: Designing
palatable representations of physical activity. In Proceedings of the 33rd Annual ACM Conference on
Human Factors in Computing Systems (pp. 2933-2942). ACM.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 50
Khot, R. A., Pennings, R., & Mueller, F. F. (2015). EdiPulse: supporting physical activity with
chocolate printed messages. In Proceedings of the 33rd Annual ACM Conference Extended Abstracts
on Human Factors in Computing Systems (pp. 1391-1396). ACM.
Li, I., Dey, A. & Forlizzi, J. (2010). A stage-based model of personal informatics systems. Proc.
CHI '10, 557-566.
Li, I., Dey, A. K., & Forlizzi, J. (2011). Understanding my data, myself: supporting self-
reflection with ubicomp technologies. In Proceedings of the 13th international conference on
Ubiquitous computing, 405-414.
Marcengo, A., & Rapp, A. (2014). Visualization of human behavior data: the quantified self.
Innovative approaches of data visualization and visual analytics, 1, 236-265.
Moere, A. V. (2008, July). Beyond the tyranny of the pixel: Exploring the physicality of
information visualization. In Information Visualisation, 2008. IV'08. 12th International
Conference (pp. 469-474). IEEE.
North, C. (2006). Toward measuring visualization insight. IEEE computer graphics and
applications, 26(3), 6-9.
Plaisant, C., Fekete, J. D., & Grinstein, G. (2008). Promoting insight-based evaluation of
visualizations: From contest to benchmark repository. IEEE transactions on visualization and
computer graphics, 14(1), 120-134.
Robertson, G., Czerwinski, M., Fisher, D., & Lee, B. (2009). Selected human factors issues in
information visualization. Reviews of human factors and ergonomics, 5(1), 41-81.
Saraiya, P., North, C., & Duca, K. (2005). An insight-based methodology for evaluating
bioinformatics visualizations. IEEE transactions on visualization and computer graphics, 11(4), 443-
456.
Rooksby, J., Rost, M., Morrison, A., & Chalmers, M.C. (2014). Personal tracking as lived
informatics. Proc. CHI’14, 1163-1172. https://doi.org/10.1145/2556288.2557039
Sharmin, M., & Bailey, B. P. (2013, June). ReflectionSpace: an interactive visualization tool
for supporting reflection-on-action in design. In Proceedings of the 9th ACM Conference on Creativity
& Cognition (pp. 83-92). ACM.
Smuc, M., Mayr, E., Lammarsch, T., Bertone, A., Aigner, W., Risku, H., & Miksch, S. (2008,
November). Visualizations at First Sight: Do Insights Require Training?. In Symposium of the Austrian
HCI and Usability Engineering Group (pp. 261-280). Springer, Berlin, Heidelberg.
Statista. (2017, February). Number of connected wearable devices worldwide from 2016 to
2021 (in millions). Retrieved from https://www.statista.com/statistics/487291/global-connected-
wearable-devices/
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 51
Stusak, S., Tabard, A., Sauka, F., Khot, R. A., & Butz, A. (2014). Activity sculptures: Exploring
the impact of physical visualizations on running activity. IEEE Transactions on Visualization and
Computer Graphics, 20(12), 2201-2210.
Thudt, A., Lee, B., Choe, E. K., & Carpendale, S. (2017). Expanding research methods for a
realistic understanding of personal visualization. IEEE computer graphics and applications, 37(2), 12-
18.
Verdezoto, N., & Grönvall, E. (2016). On preventive blood pressure self-monitoring at
home. Cognition, Technology & Work, 18(2), 267-285.
Yang, H., Li, Y., & Zhou, M. X. (2014). Understand users’ comprehension and preferences for
composing information visualizations. ACM Transactions on Computer-Human Interaction
(TOCHI), 21(1), 6.
Zhao, O. J., Ng, T., & Cosley, D. (2012, February). No forests without trees: particulars and
patterns in visualizing personal communication. In Proceedings of the 2012 iConference (pp. 25-32).
ACM.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 52
Appendices
Appendix A The 55 statements participants had to classify to one of the eight insight types.
Table A1 shows the 55 statements that were created based on the quotes found in previous
literature and the descriptions of the insight types given by Choe et al. (2015) (Table 2). The
statements are ordered per insight type.
Table A1: Statements participants had to classify
Insight Type Statement
Detail “It is easy to find the exact running distance”
Detail "it is easy to get the difference between my highest and lowest heart rate today.”
Detail "It is easy to check the highest number of steps of last week”
Detail “The legends, next to it, are helping to better understand the different colors used."
Detail “It is easy to find the current heart rate (bpm).”
Detail “It is easy to explain the meaning of the different colors.”
Detail “At the end of the day, it clearly shows which different types of physical activities were done that day.”
Detail “It is easy to find the highest and lowest heart rate measured today”
Comparison “It is easy to find the days of the week with more physical activities than the rest of the week.”
Comparison “The combination of both the sleep data and the physical activity data gives a good explanation of what the relation is between them.”
Comparison “Knowing the average hours of sleep people have, it gives the opportunity to quickly see what the relation towards this average is.”
Comparison "it is easy to see the differences and similarities in sleep between two specific nights.”
Comparison “It is easy to compare the happiness level with the number of hours slept.”
Comparison "it is easy to compare two of my latest runs, based on the speed and the distance.”
Comparison “It is easy to compare different days of the week”
Comparison "It is easy to find out what this physical level means in relation to other people".”
Correlation “It gives me the opportunity to find positive or negative relations between different measurements.”
Correlation “It is easy to see that sleep gets poor when stress is higher.”
Correlation “A negative relation between two different data types can easily be found in this visualization.”
Correlation “From the data, it is easy to tell what the relationship between the hours of sleep and the number of steps is.”
Correlation “If there is a relation between hours of sleep and number of steps, it would be
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 53
easy to find this relation.”
Correlation "It is easy to compare different measurements and the relation between these measurements.”
Data Summary “It is easy find the total number of runs done.”
Data Summary “It is easy to quickly find the average distance and speed over all tracked running activities. “
Data Summary “It is easy to check the total number of data points.”
Data Summary "It is easy to get a clear overview of all the collected data based on time and number of data points.”
Data Summary “The data is summarized in a nice way.”
Data Summary “It is easy to see for how long that data is tracked.”
Data Summary “A good overview of all the measured data from beginning till now is given.”
Trend “It is easy to see if there is a tendency in the number of steps daily taken.”
Trend “it gives the opportunity to check in which way the resting heart rate changed over the last two years.”
Trend "it gives me a clear overview of the different waking up time throughout the week.”
Trend “It is easy to find recurring tendencies in the data.”
Trend “It is easy to find if the heart rate in rest is increased or decreased over a longer period of time.”
Trend “It is easy to describe changes over a longer time.”
Self-reflection “There is a lot of contradiction between what is shown and the things I did today.”
Self-reflection “The collected data is in line with my expectations, based on what occurred today…”
Self-reflection “It is easy to make predictions based on this data.”
Self-reflection “It is in line with what was already known about my sleeping behavior.”
Self-reflection “Good predictions of how my physical activity would evolve are made.”
Self-reflection “It is possible to explain the atypical values in the data, based on things that occurred that day. ”
Self-reflection “It is possible to explain the data, knowing that the person wasn’t in an everyday affair, like for example an injury.”
Self-reflection “Based on what I did today, I didn’t expect to find this.”
Distribution "It is easy to see how much percentage of the time is spent on sleeping, walking, studying…”
Distribution it’s easy to find out if the physical activity level is different for different periods in time.”
Distribution It is easy to tell during which parts of a day the heart rate is variating more.”
Distribution “It is possible to explain how one day is divided in sleeping, working, studying, leisure time, and physical activity.”
Distribution "It is shown in a clear way how much of the time the person is sitting, running, walking, or doing other activity.”
Distribution "It is easy to check in which way the physical activities are spread on a monthly basis.”
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 54
Outlier “It is easy to find those days that the person was on a city-trip and was walking all day.”
Outlier “It is possible to find those nights that the person didn’t sleep last month.”
Outlier "it is easy to identify data points that are totally different from all other points.”
Outlier “It is easy to find measurements that are not in line with similar measures before and after.”
Outlier “it is easy to find the days the person had to do a job interview, because the average stress level is a lot higher those days.”
Outlier “It is easy to see that there are some measurements which are totally different from the other measurements.”
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 55
Appendix B
Informed consent used in the first study of this master thesis
Informed consent form
This document gives you information about the study “Getting Insights from Quantified Self Visualizations”. Before the study begins, it is important that you learn about the procedure followed in this study and that you give your informed consent for voluntary participation. Please read this document carefully.
Aim and Procedure of the study
The aim of this study is to find out if an initial pool of items is assessing the terms they are created
for. This information is used to further develop a methodology for evaluating quantified-self data
visualizations. This study is performed by Karsten Paulussen, a student under the supervision of
dr.ir. Peter Ruijten and Els Kersten – Van Dijk MSc of the Human-Technology Interaction group.
In this survey, you will get 55 statements. For each statement you should indicate to which insight it refers best, so to which insight you think the statement is related. You always have to choose between 8 different insights. On the next page short definitions of the insights are given. During the survey you can always reread this definitions.
Duration
The study will last approximately 10-15 minutes.
Voluntary & Confidentiality
Your participation is completely voluntary. You can refuse to participate without giving any reasons
and you can stop your participation at any time during the study. You can also withdraw your
permission to use your data up to 24 hours after the study is finished.
All research conducted at the Human-Technology Interaction Group adheres to the Code of Ethics of the NIP (Nederlands Instituut voor Psychologen – Dutch Institute for Psychologists).
The information that we collect from this study is used for writing scientific publications and will only be reported at group level. It will be completely anonymous and it cannot be traced back to you.
Further information
If you want more information about this study you can ask Karsten Paulussen (contact email:
If you have any complaints about this study, please contact the supervisor, Peter Ruijten ([email protected]).
Certificate of Consent
By clicking the 'next' button, you agree that you have read and understood this consent form and
you give your informed consent for voluntary participation in this research.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 56
Appendix C
Final Questionnaire created in the first study of the Master Thesis
As described in the discussion section of the first study, the final questionnaire exists of 24
statements. Four each statement, participants have to indicate on a 7-point scale if they agree with
the statement or not. In Figure C1 the statements are ordered per insight type.
Figure C1. The final questionnaire, six statements are included for each insight type. Following statements belong to the following insight types: 1-4: ‘detail’, 5-8: ‘Correlation’, 9-12: ‘Data Summary’, 13-16: ‘Comparison’, 17-20: ‘Trend’, and 21-24: ‘Self-reflection’.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 57
Appendix D
Modified Questionnaire for each of the five visualizations
Figure B1 gave the final questionnaire created in the first study of this master thesis. To use this
questionnaire for evaluating the visualizations in the second study, we had to create a modified
version for each visualization (Figure D1 – D5).
Three different types of modification needed to be done to make the questionnaire applicable for
evaluating a visualization.
Firstly, the variables used in the statements are changed to variables that are shown in the
visualizations. For example, for the first visualization, the statement “it is easy to compare the
happiness level with the number of hours slept” is changed to “It is easy to compare the minutes
lightly active with the minutes very active.”
Secondly, the visualizations presented in the second study were created from one-month Fitbit data
from one of the supervisors or simulated. This means that participants had to self-reflect on data,
they did not collect themselves. Therefore, the statements corresponding to the insight type self-
reflection started with the sentence: “Imagine the data presented in the visualizations is yours: …” In
this way, we try to give the participants the possibility to self-reflect on this data.
Thirdly, the statements are presented in a random order.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 58
Figure D1. Modified Questionnaire to evaluate Visualization 1.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 59
Figure D2. Modified Questionnaire to evaluate Visualization 2.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 60
Figure D3. Modified Questionnaire to evaluate Visualization 3.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 61
Figure D4. Modified Questionnaire to evaluate Visualization 4.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 62
Figure D5. Modified Questionnaire to evaluate Visualization 5.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 63
Appendix E
Informed Consent used in the second study of this master thesis
Informed consent form
This document gives you information about the study “Discovering data-visualizations”. Before the study begins, it is important that you learn about the procedure followed in this study and that you give your informed consent for voluntary participation. Please read this document carefully.
Aim and benefit of the study
The aim of this study is to measure the information you get and the knowledge you gain from
quantified-self visualizations. These visualizations are made from data people can collect
themselves, like physical activity, sleep patterns, heart rate, etc. This information is used further
develop a methodology to evaluate quantified-self visualizations on the information and knowledge
they offer.
This study is performed by Karsten Paulussen, a student under the supervision of dr. E.T. Kersten – van Dijk of the Human-Technology Interaction group.
Procedure
During this study, you will get to see and evaluate 6 different graphs. For each of these graphs, we
ask you to freely explore the graph and think-aloud about what you can learn about it and the
information you get form the graph. The researcher will sit next to you and can ask extra questions
based on what you learn from the graph. When you think you told everything about a graph, you can
indicate this to the researcher and a next graph is shown. Audio recordings are made during this
study. After you explored all graphs, you will complete a few questionnaires, in which you will get
some statements you will rate on a 7-item scale. After filling in these questionnaires, the study ends
and you are free to ask questions.
Risks
The study does not involve any risks or detrimental side effects.
Duration
The study will last approximately 40 minutes.
Participants
You were selected because you were registered as participant in the participant database of the
Human Technology Interaction group of the Eindhoven University of Technology.
Voluntary Your participation is completely voluntary. You can refuse to participate without giving any reasons
and you can stop your participation at any time during the study. You can also withdraw your
permission to use your data up to 24 hours after the study is finished. All this will have no negative
consequences whatsoever.
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 64
Compensation
You will be paid 7 euros (€2.00 extra if you do not study or work at the TU/e or Fontys Eindhoven).
Confidentiality
All research conducted at the Human-Technology Interaction Group adheres to the Code of Ethics of
the NIP (Nederlands Instituut voor Psychologen – Dutch Institute for Psychologists).
We will not be sharing personal information about you to anyone outside of the research team. The
audio recordings that are made, will not be distributed and will not be played back in the presence
of persons other than the researchers. The material will be used only for scientific analysis. The
information that we collect from this study is used for writing scientific publications and will only be
reported at group level. It will be completely anonymous and it cannot be traced back to you.
Further information
If you want more information about this study you can ask Karsten Paulussen (contact email:
If you have any complaints about this study, please contact the supervisor, dr. Els Kersten – van Dijk
(contact email: [email protected]).
Certificate of Consent
I, (NAME)……………………………………….. have read and understood this consent form and have been
given the opportunity to ask questions. I agree to voluntarily participate in this study carried out by
the research group Human Technology Interaction of the Eindhoven University of Technology.
Participant’s Signature Date
QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 65
Appendix F
An overview of the number of times a specific insight type is found and the medians of medians for
each insight type for all five visualizations (Table F1 and Table F2).
Table F1: Frequency of insight types for think-aloud interviews
Frequency
Type Vis. 1 Vis. 2 Vis. 3 Vis. 4 Vis. 5 Total Frequency
Detail 15 15 15 13 14 72
Comparison 15 13 15 14 15 72
Correlation 0 10 12 11 13 46
Data Summary 9 7 9 11 13 49
Trend 8 7 3 14 4 36
Self-reflection 8 5 11 8 8 40
Table F2: Median of Medians with IQR for every insight type from Questionnaire
Median of medians (IQR)
Type Vis. 1 Vis. 2 Vis. 3 Vis. 4 Vis. 5
Detail 6.5 (1.0) 6.5 (0.5) 5.5 (2.0) 6.0 (1.0) 6.5 (1.5)
Comparison 6.0 (1.5) 5.5 (1.5) 4.0 (3.5) 4.5 (1.5) 5.0 (1.0)
Correlation 3.5 (3.0) 5.0 (1.0) 3.0 (2.5) 3.5 (1.5) 4.0 (1.5)
Data Summary 6.0 (1.0) 5.5 (1.5) 4.0 (3.5) 5.0 (2.5) 5.0 (2.5)
Trend 5.5 (2.0) 6.0 (1.0) 2.5 (1.0) 6.0 (1.5) 5.0 (2.5)
Self-reflection 5.5 (1.0) 5.5 (1.0) 4.0 (2.0) 4.5 (1.5) 5.0 (2.5)
Top Related