Download - Eindhoven University of Technology MASTER Development of a ... · 1.1 Quantified Self and Personal Informatics 4 1.2 Models of Personal Informatics 5 1.3 From Information Visualizations

Eindhoven University of Technology

MASTER

Development of a questionnaire for evaluating personal informatics visualizations

Paulussen, K.

Award date:2018

Link to publication

DisclaimerThis document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Studenttheses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the documentas presented in the repository. The required complexity or quality of research of student theses may vary by program, and the requiredminimum study period may vary in duration.

General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

https://research.tue.nl/en/studentthesis/development-of-a-questionnaire-for-evaluating-personal-informatics-visualizations(9a621cee-ea8d-4564-a9ba-9aa432e539bc).html

Eindhoven, 11-07-2018

identity number 0870509

in partial fulfilment of the requirements for the degree of

Master of Science

in Human-Technology Interaction

Supervisors:

dr. E.T. Kersten - van Dijk MSc

dr.ir. P.A.M. Ruijten

Development of a Questionnaire for

Evaluating Personal Informatics

Visualizations

by Karsten Paulussen

QUESTIONNAIRE FOR EVALUATING DATA-VISUALIZATIONS 2

Contents

Abstract 3

1 Introduction 4

1.1 Quantified Self and Personal Informatics 4

1.2 Models of Personal Informatics 5

1.3 From Information Visualizations to Personal Visualizations 7

1.4 What is Insight 8

1.5 Which Insight types exist 9

1.6 Data Visualizations 10

1.7 Evaluating Personal Visualizations 13

1.8 Need for a New Evaluation Methodology 16

2 Study I: Creating the Questionnaire 17

2.1 Introduction 17

2.2 Methods 17

2.3 Results 19

2.4 Discussion 21

2.5 Conclusion 23

3 Study II: Comparison of the Questionnaire with the Insight-Based Methodology 24

3.1 Introduction 24

3.2 Methods 24

3.3 Results & Discussion 34

3.4 Conclusion 42

4 General Discussion 43

5 Limitations 45

6 Future Research 46

7 Conclusion 47

References 48

Appendices 52


Abstract

The rise of wearables, mobile applications, and other self-tracking tools make it possible to collect

many kinds of data about ourselves. However, many people stop collecting data after a while. One

reason that these people stop tracking is that the visualizations used in these tools do not offer them

the right insights to gain self-knowledge. To know which insights the users can achieve from the

visualizations, researchers and developers need to be able to evaluate these visualizations. Till now,

the methods used to do these evaluations are all time-consuming and complex to conduct. In this

master thesis an easier and more efficient tool for evaluating visualizations is created in the form of

questionnaire. This questionnaire was created in a first study and contains 24 statements. In a

second study, the results of evaluations with the questionnaire were compared with the results of

one of the more time-consuming methods. We found some similarities and differences between the

results of both evaluation methods. These similarities and differences are discussed and advantages

for using the developed questionnaire as an evaluation tool are given.


1 Introduction

As a result of the recent advances in consumer technologies, more and more people are able to

collect a wide range of personal data. The rise of wearables, smartphone applications, and other

self-tracking tools make it possible to keep track of our sleep pattern, daily exercise, energy use, etc.

Flood (2013) predicted 5 years ago that by now, 485 million wearable computing devices will be in

the market. However, statistics showed that this number was already reached last year (Statista,

2017). People start self-tracking for different reasons, for example to find a relation between

different measurements, to document their physical activities, or to set goals (Rooksby, Rost,

Morrisen, & Chalmers, 2014). Despite of the increasing number of users, one of the biggest concerns

is that people often stop using their devices and stop tracking their data. In an overview of reasons

why people stop tracking by survey almost 200 people (Epstein, Caraway, Johnston, Ping, Fogarty, &

Munson, 2016), one of the main reasons was that the visualizations shown by the tool or application

did not give or show the information the user expected to receive. Users expect to achieve self-

knowledge from their data. This self-knowledge can be obtained by getting insights out of their data.

Therefore, it is important that the proposed visualizations contain these insights. To make sure the

visualizations offer the right insights to the user, it is important that these visualizations are

evaluated and a quantification of insights has occurred (Ellis, & Mansmann, 2010). In this master

thesis, a methodology is developed to give developers and researchers the opportunity to make an

evaluation of the personal informatics visualizations and the corresponding insights people can

achieve from these visualizations.

1.1 Quantified Self and Personal Informatics

The practice of collecting and reflecting of self-tracked data is called personal informatics, personal

analytics or quantified self (Li, Dey, & Forlizzi, 2011). Besides, it is used as a term for referring to the

practice of self-tracking, Quantified Self is also the name of a community (Choe, Lee, Lee, Pratt, &

Kientz, 2014). This community started in Silicon Valley by two technology fanatics, namely Gerry

Wolf and Kevin Kelly. They were both tracking their behavior and building self-monitoring

technology. In 2008, they started the site quantifiedself.com and two years later Wolf gave a pitch at

a TED conference (Marcengo & Rapp, 2014), which can be considered as the start of the Quantified

Self movement. The community became bigger and nowadays it organizes meetings all over the

world where people share the insights they got from their tracked data or companies promote their

self-monitoring devices (About the Quantified Self, 2016). The aim of the rise of the quantified self is

to give people a better understanding of their behavior based on collected data, such that behavior

changes can be made to live happier and healthier lives (Etkin, 2016). Marcengo and Trapp describe


three basic points that define the QS movement, namely: 1) data collection (behavioral, contextual,

or psychological), 2) the displaying of these data, and 3) cross-linking the data to get insights from it.

All three such that the user can achieve self-knowledge from the data. In the beginning, members of

the QS community were all so-called technology enthusiastic, but with the rapid advancements in

technology all kind of people can easily track personal data by using commercial trackers or mobile

apps (Choe, Lee, Zhu, Riche, & Baur 2017), which means that the three basic points, given before,

should hold for all people who want to collect and reflect on self-tracked data. This shift from

technology enthusiasts to everyday people prompted some challenges in the Quantified Self field,

with as mean challenge: ‘why do people stop self-tracking?’. The field of personal informatics, which

is used as term in academic research, studied many self-tracking technologies and recognized the

value of these technologies in the domain of health and well-being. However, a list of barriers

towards the adoption of these technologies is identified (Li, Dey, & Forlizzi, 2011), which included

unaccepted visualization, poor analyzing skills, or no possibility to cross-link the data. To solve these

barriers towards the adoption of self-tracking technologies, more research in the field of personal

informatics needs to be done. But first it is important to understand the process self-trackers go

through and allocate these barriers in this process.

1.2 Models of Personal Informatics

To better understand the process self-trackers go through and how people use personal informatics,

different models are developed.

The best-known model is the stage-based model from Li, Dey, and Forlizzi (2010). They conducted a

survey with 68 people who collect and reflect on personal information. From the answers, they

define a model that consists of 5 iterative stages and for each of these stages they compiled a list of

barriers (Figure 1).

Figure 1. Five-stage model of personal informatics (adapted from Li et al., 2011)


First is the preparation stage, this stage is about people’s motivation to start collecting and their

decision what and how they want to collect the personal information. Examples of barriers people

encounter during this stage are: choosing the wrong tool or choosing the wrong data to collect.

Next is the collection stage, during this stage people collect information about themselves. Barriers

in this stage can be tool-, user-, or data-related. E.g. the tool is not accessible at the right moment in

time, the user forgot to record data, or the data rely on subjective estimations.

The third stage in the model, is the integration stage. In this stage the collected data is prepared,

combined, and transformed for the user. The integration phase can be long or short. We speak

about a short stage when synchronization of different data over different devices is going

automatically. Examples of barriers in the integration stage are bad transcribing of the data and

scattered visualizations.

The fourth stage of the model is the reflections stage. In this stage the user reflects on their personal

information. Two types of reflection phases exist. Namely, the discovery phase, in which users try to

find behavior to change and the maintenance phase, in which users try to keep maintenance of the

behavior they changed. Problems in the reflection stage occurs when users have difficulties with

retrieving or understanding information.

The fifth and last stage is the action stage, in this stage the user choose what he will do with the

personal information retrieved. The model has some important properties designer and researcher

should consider, namely: 1) the model is iterative; 2) Problems in an earlier stage affect later stages;

3) Each stage is user-driven, system-driven, or both; 4) personal informatics systems can be uni-

facets or multi-facets. Li, Dey, and Forlezzi (2010) believe that this stage-based model, with the

barriers and properties, can help researchers and developers to find solutions and develop new

approaches.

Such a new approach, based on the stage-based model of Li and colleagues, is developed by Epstein,

Ping, Fogarty, and Munson (2015). They argue that the use of personal informatics devices cannot be

strictly separated into stages and therefore introduced the Lived informatics model, an extended

version of the model from Li et al. (2010).

Epstein and colleagues (2015) argue that people not only start tracking with as goal to change their

behavior, but also with other motivations, like comparing their activities with others, or curiosity.

Therefore, they divide the preparation stage into a deciding stage and a selecting stage. In this

master thesis, we assume that users can have different end-goals, but that the initial goal wherefore

people start tracking is to gain knowledge about themselves based on data.

Next, Epstein et al. (2015) also argue that the collection, integration, and reflection stage are an

ongoing process where the different stages can occur simultaneously. They called this process


tracking and acting.

Furthermore, Epstein and colleagues (2015) also include the lapsing stage, users arrive at this stage

when they stop using the self-tracking system. Lapsing can start by barriers in the collection,

integration, or reflection stage.

Lastly, they also include the possibility to resume in their model. Users can resume tracking their

data for the same or different reasons as before. We agree with the idea of an ongoing process, for

which in each stage the possibility exists that the user stop-tracking. We described before that there

are different reasons why people stop tracking. For each of these reasons associated barriers can be

described.

It is important that we find solutions to overcome these barriers and give users a maximal ability to

get insights out of their data. In this master thesis we will focus on one specific barrier users

encounter during the integration- and reflection stage, namely that the user is not able to get

insights from the visualizations of the presented data, because it does not contain the information

the user wants to get from it.

1.3 From Information Visualizations to Personal Visualizations

From the end of the 1980s big progress is made in hardware technology, which makes it possible for

companies and researchers to store large amounts of data (Keim, 2002). These collections of large

amounts of data also asked for new ways to mining and presenting this data, which lead to the rise

of the Information Visualization (InfoVis) community (Ellis, & Mansmann, 2010). Robertson,

Czerwinski, Fisher, and Lee (2009) describe information visualization as “The art and science of

representing abstract information in a visual form that enables human users to gain insight through

their perceptual and cognitive capabilities (p. 1)”. Two important aspects of this information

visualization community are: 1) Information visualization is used to discover new insights and

knowledge from the data, and 2) Information visualizations are representations of data that

amplifies human cognition (Chen, 2010). Both the definition of Robertson et al. (2009) and the

important aspects described by Chen refer to the importance of taking into account the human’s

capabilities when designing data visualizations. Therefore, many research in the InfoVis community

is done towards the way humans process and perceive data visualizations. Like for example the

influence Gestalt Principles (cognitive psychology) on information visualizations (Kerren, Stasko,

Fekete, & North, 2008). Different topics challenged by the community nowadays are for example

‘transformation of data to enable analysis’, ‘visual design and aesthethics’, and comparison in

information visualization: models and challenges’ (Kerren, Plaisant, Stasko, 2011).


Beside doing research to solve the main challenges in information visualizations, members of the

community also developed InfoVis toolkits. These toolkits can be used to create data visualizations

and integrate the findings of the community into the different features of the tool. The best-known

example is Tableau, which can be used by less technically sophisticated and knowledgeable people.

These people got less attention by the InfoVis community in the past, but due to the big process in

technology more and more domains (economics, agriculture, psychologists, etc.) are collecting data

and trying to visualize this data to get insights from it (Few, & Edge, 2008). Take for example a

farmer, in the past he just milks his cows twice a day and at the end of each day he could check the

amount of liters milk he collected. Nowadays, data is collected from each cow separately, like for

example the number of liters it gives, but also the number of steps it took, how much hours of the

day it was eating, etc. Thanks to toolkits like Tableau software, this data is visualized such that

farmers can easily consult this data. However, giving these farmers the possibility to easily consult

the collected data does not mean that they can easily get insights from it. Research was needed to

find out in which way the data had to be visualized such that the farmer could easily understand it

and gain knowledge from it.

This example shows that the shift of data collection from the academic and scientific world to other

domains resulted in several challenges for the InfoVis community. Similarly, new challenges

appeared with the rise of the quantified-self community. With the advances in the consumer

technology, not only scientist or domain experts (like farmers, medicals, or biologists (Sariaya, North,

& Duca, 2005)) have access to data visualizations, but all type of users can use self-trackers to collect

data about themselves. We will refer to this group of users as the general user in the remainder of

this thesis. Another big change lies in the type of data that is collected. With the rise of QS and

Personal Informatics, the data became more personal. Therefore, the type of visualizations in

domain of personal informatics is called personal visualizations. Despite the target user changed and

the data to be visualized got more personal, the main purpose of personal visualizations is similar to

this of information visualizations, namely: giving the user insights into their data. New research

needed, and still need, to be done to find out in which way these general users get insights and what

types of insights they want to get from the data visualizations.

1.4 What is Insight

‘The purpose of visualization is insight not pictures’, this is what Card, Mackinly, and Schneiderman

(1999) conclude in their book ‘reading and information visualization: Using Vision to Think’.

Therefore, it is important to define what insight in the domain of quantified self means, which will

help to better understand in which way the general user gets insights from his data. According to the


Cambridge Dictionary, Insight can be defined as: “the ability to have a clear, deep, and sometimes

sudden understanding of a complicated problem or situation.” Another definition is given by

plaisant, Fekete, and Grinstein (2008, p. 120), who declare insight as: “a nontrivial discovery about

the data.” Both these definitions are rather wide. A more specific definition can be found in the

domain of information visualization (infoVis) in which they try to give insights into complex data.

Saraiya, North, and Duca (2005, p. 444) defines insight as followed: “An individual observation about

the data by the participants, a unit of discovery.” Another study in the InfoVis domain by North

(2006) suggests that any formal definition of Insight is either too restrictive or too general and

vague, therefore he concluded that it Is better to define five different characteristics of insight to get

a better understanding of insight, namely: complex, deep, qualitative, unexpected, and relevant. In

all those descriptions and definitions, the same idea of something that is gained or discovered from

the data by the user is given. Based on this idea, the following definition of Insight will be used in this

master thesis: “A unit of discovery gained from the data, which can lead to better self-knowledge.”

1.5 Which Insight types exist

Now we know what an insight is, it is important to answer the question “which insights can be

gained from a specific visualization?”. Before we can answer this question, it is important to better

understand which types of insights exists and can be offered by a visualization.

In the field of Information visualization, Yang, Li, and Zhou (2014) created a taxonomy of insights

people derive from visualizations. They presented three different simple graphics to more than 500

participants. For each graph, the participant had to express what they understand and learn from it

in free text, with a minimum of 15 words. These texts were coded and analyzed. From these analysis,

they found eight types of insights, which they organized in two groups, namely: basic insights and

comparative insights. The four types of basic insights are Read Value, Identify Extrema, Characterize

Distribution, and Describe Correlation. The four types of comparative insights are compare values,

compare extrema, compare distribution, and compare correlation. In this study, participants had to

achieve insights from data that is not related to themselves. This is different with self-tracked data,

where visualizations are created from personal data. In the quantified-self domain a taxonomy was

created by Choe, Lee, and Schraefel (2015). They analyzed 55 Q-selfers’ (members of the Quantified

Self community) presentations to analyze the insights they got from their data and the visualizations

they use to show these insights. They identified eight insight types, namely: detail, self-reflection,

trend, comparison, correlation, data summary, distribution, and outlier. In Table 1, a description is

given for each of this types with their subtypes.


Table 1: Description of the 8 different Insight Types as given by Choe et al. (2011)

Types Description

Detail 1) Explicitly state the identities of the data points possessing extreme values of the

measure variable

2) Explicitly specify the measured value, its range for one or more clearly identified data

points.

3) Explicitly state the values of categorical variables, labels from the axes, or legends

Self-

Reflection

/Recall

1) Uncaptured data provided by the presenter to understand and explain a

phenomenon shown in the data

2) Collected data contradicts existing knowledge

3) Predict the future based on the collected data

4) Collected data confirms existing knowledge

Trend Describe changes over time

Comparison 1) Compare measured values by a factor (other than time)

2) Compare measured values segmented by time

3) Bringing in external data for comparison

4) Compare two specific instances

Correlation Specify the direct relationship between two variables (but not as comparison)

Data

summary

Summary of collected data (such as number of data points and duration of tracking)

Distribution 1) Explicitly state the variability of measured values

2)Explicitly describe the variation of measured values across all or most of the values of

a categorical variable

Outlier Explicitly point out outliers or state the effect of outliers

In a recent study Choe et al. (2017) built a web-based application (Visualized Self) based on the eight

insight types and conducted think-aloud study to examine how people reflect on their personal data

and what types of insights they gain. In this study the participants were not only members of the

quantified-self community, but also other self-trackers. Based on interviews with the participants,

they conclude that their web-application helped participants identify most of 8 personal insight

types reported.

1.6 Data Visualizations

Despite of the fact that the way self-tracking applications provide the data to the user is often not in

line with the user’s expectations (Epstein et al., 2016), developers already came up with multiple

ways to make the data available for the user. All attempting to give the user the opportunity to gain

some insights. The approaches current applications use to show the collected data can be divided in

three types of methods, namely: abstract ways to present data, data visualizations, or providing raw

data.

Protagonist of the abstract ways of presenting self-tracked data argues that this data is mostly

abstract in nature (Moere, 2008). Measured data like heart rate, skin conductance, etc., do not have


a one-to-one relationship with objects in the real world. Therefore, they belief that more natural

ways of presenting data can also be successful in conveying information (Consolvo, McDonald, &

Landay, 2009). One example of such an abstract way of presenting the data to the user is by art. Fan,

Forlizzi, and Dey (2012) developed a web application, called Spark, that visualizes physical activity

data from Fitbit trackers in abstract art. Users could choose among five visualizations when entering

Spark, four abstract ways and one graph. After three weeks, they interviewed several families who

used the application. Their results suggest that the abstract visualizations are appealing and

aesthetic, but they also found that most participants chose for the graph visualizations when they

looked for specific information, daily trends, or activity over time. Another example of visualizing

data in an abstract way, is the use of physical visualizations. Stusak, Tabard, Sauka, Khot, and Butz

(2014) presented physical running data with data sculptures. From every run, they created a unique

physical piece, which is part of a larger sculpture. After a three-week field study, they performed

interviews to evaluate their concept. The results of these interviews showed that participants had

difficulties to understand how their run was visualized. More recent studies also investigated the use

of palatable representations and chocolate printed messages to visualize physical activity data (Khot,

Lee, Aggarwal, Hjorth, & Mueller, 2015; Khot, Pennings, & Mueller, 2015). The results of these

studies all show that the abstract ways of visualizing data have some advantages in attractiveness

and aesthetics. However, some of them also show that users have difficulties to get insights from

these visualizations. As we described before, the main purpose of personal visualizations is giving

the users insights into their data.

The more traditional way of presenting data with graphical representations (graphs, scatterplots,

timeline-visualizations, etc.) can be more effective in giving the user the opportunity to get insights

from their data. We will give different examples of studies showing that general users can get

insights from graphical presentations. The first example is MobiTop, developed by Park, Oliver, and

Pedro (2015). It is a self-reflection tool for understanding browsing behavior. The tool shows an

overview of the user’s mobile browsing app with different graphs. A first pilot study with the tool

showed that users could get insights from these graphs.

Another example is Lullaby, a capture and access system for understanding the sleep environment

(Kay et al., 2012). Lullaby combines temperature, light, motion, audio, and sleep data and show this

data to the user with graphical representations, like timeline visualizations. To evaluate the system,

Lullaby was installed in four participants their homes. After a two-week study, the participants were

interviewed to explore their experiences with the system. The results of the interviews show that

the participants could get insights from the graphs.


A recent study on preventive blood self-monitoring from Verdezoto and Grönvall (2016) also

investigate the usefulness of graphical representations. They create different line charts (daily and

weekly) visualizations from participants blood pressure data. The visualizations were shown in a

workshop, where participants, mainly elderly, had to evaluate them. Most of them found the charts

useful and got insights from them.

In another study, Cuttone, Lehmann, and Larsen (2013) showed a mobile personal informatics

system with visualizations in the form of spiral timelines and bubbles, which are more recent types

of visualization techniques. In their experiment, they collected location data from 136 participants

for four months. A participant survey afterwards showed that more than half of the participants

reported preference for getting insights with these visualizations.

All four of these examples show that users could get insights from graphical and scientific

visualizations, even when these users were general users without any knowledge of graphs, like the

elderly in the study from Verdezoto and Grönvall. These results are in line with a study from Smuc et

al. (2008). They investigated the generation of insights and understanding by means of visualizations

without any training. A case study was conducted to find out if some knowledge about a

visualization is necessary. In the study, participants, both experts in visualizations and non-experts,

were asked to give the insights they got from two types of visualizations (cycleplots and multiscale

visualizations) using the think-aloud method. When they were not able to generate new insights,

they got a short explanation of the visualizations. Next, they were asked again to give the insights

they get by using the think-aloud method. Afterwards the results of the expert in data visualizations

were compared with the results of non-experts. Little differences were found between them, both

could generate insights into the data in similar time and quality.

All these examples show that the use of graphical representations can be effective to give the user

the opportunity to get insights from their data. However, users often stop tracking because they are

not satisfied with the presented information and the visualization do not always show what the user

expected, even when their data is presented with graphical representations. This allows us to

conclude that it is important to know which insights can be obtained from a graphical

representation. Or in other words, graphical representations need to be evaluated on the insights

the general user can achieve from them.


1.7 Evaluating Personal Visualizations

Based on the different insight types found by Choe et al. (2015) and the different studies showing

the capability of the general self-tracker to get insights from graphical visualizations, we believe that

the number of people who stop tracking can be reduced by offering them the right visualizations.

These visualizations need to provide the self-tracker the data-based insights that he expected to gain

from the visualizations. But how can we measure if the provided visualization gives the user the

opportunity to get those insights from the data? Or in other words: in which way are visualizations

evaluated on the insights people can get from them?

Despite of different attempts made in the past to create solid theories and models, there is still a lot

of research needed in the field of visualization evaluations (Keim et al., 2010). One of the main

reasons that proper evaluation of visualizations is hard is that humans play a central role in these

visualizations. In chapter 8 of the book: ‘solving problems with visual analtyics’, Keim et al. (2010)

describe the state of the art of evaluation methodologies in visual analytics. Methodologies and

metrics are needed to objectively measure usability, learnability, quantification of insights and

technology benefits. North (2006) focuses on the question: how to measure insight? He argues that

the purpose of visualization evaluation is to determine to what degree visualizations achieve

insights. In his Visualization Viewpoints paper, he proposes two new evaluation methods that

measure insight more directly. In the first method, he suggests including more complex benchmark

tasks in the existing controlled experiments. These tasks can ask for correlations, distributions or

other types of insight types. Where in previous controlled experiments with benchmark tasks, the

correctness measure is Boolean (right or wrong), he argues to work with multiple choice questions

for these tasks. North argues that these types of tasks generally support visualization overviews. The

second method North proposes is to eliminate benchmarks tasks and using a qualitative insight

analysis in the form of a think-aloud study. Afterwards the qualitative data must be convert into

quantitative data by coding. After converting, researchers can count the number of insights and the

time they needed. North realizes that these proposals are a good first step to new methods that

determine whether visualizations are achieving their grand purpose, namely insight. This paper of

North can be considered as one of the leading papers towards insight-based visualizations.

The first example of an insight-based methodology for evaluating visualizations based on the

guidelines of North, was in the area of bioinformatics. Saraiya, North, and Duca (2005) evaluate five

different visualization tools for bioinformatics. An open-ended think-aloud study was done where

participants analyze the data. Saraiya et al. conclude that the methodology succeeded in providing a

good analysis of the insight capabilities of the visualizations, but that there are still some difficulties.


The main difficulties are that the method is labor intensiveness and users should be motivated for

the think-aloud study. They propose that further work is needed to overcome these main difficulties,

such that the effectiveness of visualizations in many domains can be measured more easily.

In a more recent study from North, Saraiya, and Duca (2011) a comparison is made between a

benchmark tasks evaluation method and the insight-based evaluation method for information

visualization. One of the main conclusions of this study was that the insight method refutes the task

method, which can suggest that users behave differently when not in the forced direction of a task-

based method. Also, in this study the insight method was surprisingly more powerful than the task

method.

Despite the insight-based methodologies proposed by North and colleagues can be seen as one of

the leading methods in evaluation of visualizations, recently other evaluation methods are shown to

be useful as well. One of such a method is the Elicitaition Interview Technique. Hogan, Hinrichts, &

Hornecker (2015) discuss the potential of this interview technique in the context of visualization.

They claim that the Elicitation interview technique is a valuable addition to current evaluation

techniques. It can help to evaluate how people experience visualizations and can be applied when

studying how visualization supports data analysis processes.

The papers, described above, are all theoretical proposal on how to evaluate visualizations, but

there are also studies, in different domains, that evaluate visualizations in real world settings.

One of these studies was done in the domain of the Learning Analytics. Govaerts, Duval, Verbert, &

Pardo (2012) developed the Student Activity Meter that visualizes learner actions. They had different

design iterations and used both quantitative and qualitative evaluation studies to access the

usability, use, and usefulness of the visualizations. For the first iteration computer science students

evaluated the visualizations in a task-based interview. The students’ satisfaction was also measured

with the System Usability Scale and Microsoft Desirability Toolkit. For the next iteration teachers

evaluated the visualizations by filling in a survey. This survey contained multiple choice and open

questions. For the last iteration computer science teachers evaluated the visualizations in a face-to-

face interview with fixed questionnaire of 25 open questions and tasks. Govaerts et al. (2012)

conclude that the methodology they used can be useful for other evaluations of visualizations. They

believe that evaluating iteratively is very valuable and hope that this work will inspire others to do

more user studies on visualizations.

Another example of evaluation of visualizations from real-life data came from Zhao, Ng, & Cosley

(2012). They developed pieTime, a visualization of people’s behavior across communication media.


They evaluate pieTime on how well it supports reflection on patterns of activity. The evaluation was

as follow: the participants first got some time to play around with the tool while thinking-aloud; next

an interview was done, where some open-questions were asked. One of the limitations Zhao et al.

remarked, was that they used a short-term lab-based deployment to evaluate the tool and that such

evaluations might overstate their value. Sharmin & Bailey (2013) developed ReflectionSpace, an

interactive visualization tool for supporting activities associated with reflection in action. It is a tool

that map design materials to the design phase and context of use. To evaluate the tool, Sharmin &

Bailey followed a cued recall paradigm where participants had to recall things about a project with

and without the tool. Afterwards the participants were interviewed about their overall perceptions

of the tool.

The tree examples described above are not related to the quantified-self domain, this is because

Personal visualizations or visualizations of personal informatics data are relatively new. Therefore,

the number of studies done towards effectiveness of these evaluations is limited. Choe et al. (2015)

identified their 8 types of insights by analyzing public data in the form of Q-selfers’ presentations,

and to test these types of insights, Choe et al. (2017) built an application and let evaluate the users

the application by doing an insight-based think-aloud study. The use of different qualitative

methods from other disciplines in the quantified-self domain was proposed by Thudt et al. (2017).

They provide four empirical methods (Elicitation Interviews, Probing, Diary studies, and Analysis of

public data) which can inform or evaluate the design of personal visualizations. All four of these are

qualitative and time-consuming methods. Furthermore, it is hard to compare results of different

studies which used these evaluation methods.

Despite of the work by North (2006) and other researchers who came up with an insight-based

methodology to evaluate visualizations, we see that often visualizations in the personal informatics

domain are evaluated with complex and intensive methodologies. Visualizations are often evaluated

by experts or by doing a qualitative method where the visualizations are shown to a limited number

of participants and the questions or tasks are generated by the researchers without any reference to

previous studies.


1.8 Need for a New Evaluation Methodology

We showed before that it is very important in the field of personal informatics to know if the user

can gain insights from the visualizations. Therefore, developers and researchers must be able to

evaluate the visualizations they created, before distributing them to the self-trackers. We believe

that in this way, the number of self-trackers that stop tracking their data will be diminished. Till now

evaluation of visualization is mostly done with qualitative methods. Considered the rapid growth of

the quantified self, there clearly is a need for a more efficient and easy methodology to evaluate

personal informatics visualizations. Such an easy and time-efficient data-collection method is a

questionnaire. Next to its efficiency and its usability, a questionnaire supports objectively and allows

easy comparison between results. Therefore, the main aim of this master thesis is to develop a

quantitative method to evaluate personal-informatics visualizations on the insight people can gain

from them. A second aim of this thesis is to find out to which extend the insights found with the

developed questionnaire can be compared with insights found with a qualitative method.

These two aims will be studied in two different studies. In a first phase, the questionnaire is created

based on current literature and a selection of the best question-items. In a second phase, the

created questionnaire is compared with the insight-based methodology, which is given earlier in this

section.

In the remainder of this master thesis, these different phases to develop and compare the

questionnaire for evaluation of personal visualization are shown. For each of these phases, the study

performed during the phase is described and the results of these studies are given and discussed.

After describing each phase separately, a general discussion is made in which an answer on the main

aim and other goals of thesis is given. Next, limitations of the conducted studies are given together

with some suggestions for future research. Finally, an overall conclusion is made.


2 Study I: Creating the Questionnaire

2.1 Introduction

The developed quantitative method is a questionnaire with which researchers and developers can

evaluate their visualizations on insights people can gain from them. As shown in the introduction,

previous studies found that there are eight different insight types people can get from a personal

visualization (Choe et al., 2015). Therefore, the questionnaire should be able to measure for each of

these insight type if a person is could achieve it from a visualization.

Qualitative methods showed that statements given by a participant can be categorized to a specific

insight. We argued that similar statements related to an insight type could be used to develop the

questionnaire. In this first study, we investigated which of these similar statements should be

included in the questionnaire.

2.2 Methods

2.2.1 Design & Participants

A survey was posted online for three weeks to determine which statements per insight type should

be included in the final questionnaire.

In this survey, participants had to classify a set of statements towards one of the eight insight types

as found by Choe et al. (2015) (table 1). Participants had to classify each of these statements to one

of the eight insight types to see if the statements indeed related towards the insight types they were

created for. We argued that there was only one right answer for each statement, namely the insight

type for which the statement was created. As one of our main goals was to find an efficient way to

evaluate visualizations, we decided to include only a small number of statements for each insight

type. We looked for the statements with the highest classification and expected to find at least four

statements for each insight type that have a high classification rate.

More than 400 people opened the online survey, but most of them stopped after reading the

informed consent or reading the descriptions of the insight types. A total of 63 participants started

classifying statements. Only 51 of them classified all 55 statements towards one of the insight types.

The other 12 participants filled in a part of the survey. As some of the insight types and their

descriptions ask for some basic statistical knowledge, the online survey was only shared with

university students. All students following the course ‘Research Methods’ at Eindhoven University of

Technology were send an invitation. An invitation for the online survey was also send to fellow


students in the Master Human-Technology Interaction and university students in other master

degrees.

2.2.2 Statements

To create the initial pool of statements, we collected all quotes categorized to one of the insight

types that were given in previous studies. These previous studies existed of studies reviewed by

Kersten-van Dijk, Westerink, Beute, and IJsselsteijn (2017) in their critical review on personal

informatics, self-insights and behavior change together with other recent studies (Choe, Lee, Lee,

Pratt, & Kientz, 2014; Choe, Lee, Kay, Pratt, & Kientz, 2015; Choe, Lee, Zhu, Riche, & Bauer, 2017;

Collins, Cox, Bird, & Cornish-Tresstail, 2014; Cuttone, Lehmann, & Larsen, 2013; Epstein, Cordeiro,

Bales, Fogarty, & Munson, 2014; Kay et al. 2012; Li, Dey, & Forlizzi, 2011; Rooksby, Rost, Morrison,

& Chalmers, 2014) .

In these studies, insights were coded from quotes found during interviews, focus groups or think

aloud studies. These quotes were listed for each insight type. Based on these lists and the

descriptions of each insight type given by Choe et al. (2015) 6 to 8 statements were created per

insight types. At the end 55 statements were created (Appendix A).

2.2.3 Procedure

When participants opened the link to the online survey, they first got an informed consent

(Appendix B). They had to read and accept this informed consent before they could start filling in the

survey. Next the description of each insight type (table 1) was shown to the participants. They were

asked to clearly read these descriptions, before entering the next page. The next seven pages each

showed eight of the created statements in a random order. Participants had to classify each of these

statements to one of the eight insight types. During the survey, the participants could always consult

these descriptions. After classifying all 55 statements, the goal of the study was explained.

2.2.4 Data Analysis

For each statement, the classification rate that this statement was classified to the insight type for

which is was created is calculated. Furthermore, we also calculated the proportions that the

statement was classified to the other insight types.


2.3 Results

2.3.1 Outliers and Partly filled in Surveys

One participant, who filled in the whole survey, only categorize 16% (9/55) of the statements

correctly. We argue that this participant did not fully understand all 8 insight types and decided to

drop this observation.

As described in the method section, 12 participants only partly filled in the survey. From these 12

participants, 5 participants stopped after the first page of statements, 4 after the second page, one

after the third page, and 2 after the fourth page. We had a better look to these 12 participants,

trying to understand the reason why they stopped filling in the survey and drop the results of them

when needed.

Participants with ID’s 25, 30, 79, 116 and 151 all stopped after filling in one page of statements,

which means they classified 7 or 8 statements. All of them classified at least three of these

statements to another insight type than the type for which the statement was created. We argue

that 7 out of 52 statements classified is too low to consider these data point as reliable and dropped

them. ID 71, 89, 189 and 191 filled in two pages of statements, except from ID 89, these participants

classified most of the 15 or 16 statements right. The person with ID 89 classified only 6 of the 16

statements right. Based on this number and the fact that this person only filled in 16 of the 55

statements, we decided to drop this observation. We calculated for the remaining 56 participants

the number of statements they qualified correctly on the total number of statements they qualified.

An average classification rate of 54.76% (SD=10.66) is found. If we only considered participants who

classified all statements, a classification rate of 55.05% (SD=10.09) is found, which is a similar result.

Therefore, we decided to include these partly filled in surveys.

The analysis to find the best statements for each insight type is done with the remaining 57

observations.

2.2.3 Classification per Insight Type

As the results of total classification rate already illustrated, not all statements were classified

towards the insight type for which these statements were created. As one of our main goals was to

find an efficient way to evaluate visualizations, we decided to include four statements for each

insight type in the final version of the scale. Table 2 gives for each insight types the four statements

that had the highest classification rate for the insight for which they were created. The names of the

statements indicate for which insight type the statement was created. For each of these statements

the relative frequencies are given for the number of times the statements were classified into the

eight different insight types.


Table 2: Four ‘best’ classified statements per insight type with its relative frequencies of correct classifications

Statements Det

ail

Co

mp

aris

on

Co

rrel

atio

n

Dat

a

Sum

mar

y

Tren

d

Self

-re

flec

tio

n

Ou

tlie

r

Dis

trib

uti

on

Detail 3 73.08% 0% 0% 15.38% 1.92% 3.85% 1.92% 3.85%

Detail 5 75.93% 3.70% 0% 7.41% 3.70% 1.85% 3.70% 3.70%

Detail 6 57.41% 9.26% 3.70% 5.56% 0% 20.37% 1.85% 1.85%

Detail 8 62.96% 1.85% 0% 14.81% 0% 7.41% 1.85% 11.11%

Comparison 1 0% 62.96% 33.33% 0% 1.85% 1.85% 0% 0%

Comparison 3 5.66% 86.79% 1.89% 0% 3.77% 0% 0% 1.89%

Comparison 4 5.66% 90.57% 1.89% 0% 0% 1.89% 0% 0%

Comparison 7 3.85% 86.54% 0% 1.92% 5.77% 1.92% 0% 0%

Correlation 1 1.85% 11.11% 77.78% 1.85% 1.85% 1.85% 3.70% 0%

Correlation 3 1.92% 19.23% 75.00% 0% 3.85% 0% 0% 0%

Correlation 5 1.85% 11.11% 85.19% 0% 0% 0% 0% 0%

Correlation 6 1.92% 9.62% 78.85% 5.77% 1.92% 0% 0% 1.92%

Data Summary 1 12.96% 0% 0% 74.07% 1.85% 0% 5.56% 5.56%

Data Summary 4 24.00% 0% 0% 70.00% 0% 0% 2.00% 4.00%

Data Summary 5 5.56% 5.56% 0% 72.22% 7.41% 0% 1.85% 7.41%

Data Summary 6 3.70% 0% 0% 85.19% 0% 7.41% 3.70% 0%

Trend 2 2.00% 2.00% 10.00% 4.00% 74.00% 4.00% 2.00% 2.00%

Trend 3 2.00% 4.00% 8.00% 6.00% 66.00% 4.00% 0% 10.00%

Trend 5 0% 3.85% 0% 1.92% 88.46% 1.92% 0% 3.85%

Trend 6 1.92% 13.46% 0% 9.62% 69.23% 0% 0% 5.77%

Self-reflection 4 0% 3.85% 0% 3.85% 3.85% 84.62% 0% 3.85%

Self-reflection 5 4.00% 8.00% 4.00% 0% 2.00% 70.00% 10.00% 2.00%

Self-reflection 7 1.92% 5.77% 0% 1.92% 9.62% 76.92% 3.85% 0%

Self-reflection 8 1.92% 1.92% 3.85% 0% 1.92% 80.77% 7.96% 1.92%

Outlier 1 3.70% 11.11% 3.70% 0% 5.56% 1.85% 62.96% 11.11%

Outlier 3 26.00% 10.00% 10.00% 8.00% 0% 12.00% 28.00% 6.00%

Outlier 5 9.62% 19.23% 0% 3.85% 1.92% 13.46% 44.23% 7.69%

Outlier 6 9.62% 34.62% 0% 1.92% 0% 0% 46.15% 7.69%

Distribution 2 0% 5.66% 1.89% 11.32% 30.19% 7.55% 1.89% 41.51%

Distribution 4 31.48% 1.85% 1.85% 16.67% 12.96% 14.81% 0% 20.37%

Distribution 5 24.07% 7.41% 3.70% 18.52% 7.41% 16.67% 0% 22.22%

Distribution 6 7.69% 7.69% 3.85% 7.69% 40.38% 1.92% 3.85% 26.92%


2.4 Discussion

The average classification rate found showed that participants classified slightly more than half of

the statements to the insight type this statement was created for. This showed that despite of the

fact that the statements were based on quotes which were categorized to these insight types in

previous studies, it was still very difficult to classify them. No clear reason, based on the data, can be

given for these results.

Many people opened the survey, but stopped after reading the description of the insight types. This

can indicate that the descriptions of the insight types are hard to understand. We were aware that

some basic statistical knowledge was needed to understand these descriptions.

Despite of the fact that the average classification rate was rather low, we found for six of the eight

insight types at least four statements with a high classification rate. As shown in table 2, these

insight types were: Detail, Comparison, Correlation, Data Summary, Trend, and Self-reflection.

For most of the statements in the insight types, a classification rate higher than 70% was found.

Compared to the average classification rate, these rates are high.

For the statements with a classification rate lower than 70%, we found a big difference with the

other insight types, which means that these statements were harder to classify to the type they

were created for, but cannot be classified as one of the other insight types. Therefore, we decided to

include these statements in the final questionnaire.

The results in table 2 also show that we could not find four statements that clearly can be classified

to Outlier. A large proportion of participants classified the statements created for insight type

Outlier to insight type Detail or Comparison. An explanation can be found in the descriptions given

by Choe et al. (2015) and thus also at the beginning of the study (table 1). Insight type outlier is

described as “Explicitly point out outliers or state the effect of outliers”. To decide if an observation

is an outlier and thus explicitly point out an outlier, you have to compare this observation with other

observations in the visualization. This can explain the high proportion of participants that classified

these statements as Comparison.

Similarly, the description given to the participants for Detail was as followed: “Explicitly state the

identities of the data points possessing extreme values”. Often, an observation is considered as an

outlier, when the measured variable is extremely different from other observations. In this way, we

can consider insight type outlier as a subtype of insight type Detail.

Based on the high proportion found for types Detail and Comparison and the clear overlap in the


descriptions of outliers and these types, we argued that insight type Outlier is not mutual exclusive

from these two other insight types. Therefore, we decided to not include statements for Outlier in

our final questionnaire.

A second insight type, for which we could not find four statements with a high classification rate, is

Distribution. The results in table 2 shows that statements created for Distribution are often classified

as Detail or Trend. Similar to Outlier, these high classification rates for these types can be explained

with the descriptions given to the participants (table 1). The second point in the description of

Insight type Detail was: “Explicitly specify the measured value, its range for one or more clearly

identified data points”. To specify a range for more data points, you have to take a better look to

the minimum and maximum value of these data points. Overlap can be found with the first point in

the description given for Distribution, which was as followed: “Explicitly state the variability of

measured values”. To state a variability of measured values, you also have to look the minimum and

maximum values found. This explains why a large proportion of the participants classified these

statements to Detail instead of Distribution.

Furthermore, to state a variability of a variable, you have to consider all measured values. Self-

tracked data is often measured in time, which makes that the variability of a variable will often be

stated with respect to time. However, the descriptions of changes over time is categorized as insight

type Trend. We conclude that ‘Describing the changes over time’ and ‘State the variability of

measured values’ cannot always be considered as two different types of analysis. This explains why

many participants classified statements created for insight type Distribution as Trend.

Based on the high proportion found for types Detail and Trend and the clear overlap in the

descriptions of Distribution and these types, we decided to not include statements for Distribution in

our final questionnaire.

The final questionnaire is created based on the discussion before and shown in Appendix C. The

questionnaire exists of 24 statements, four for each of the six insight types that were included.

For these statements participants have to indicate on a 7-point scale ranging from 1- ‘strongly

disagree’ to 7 – ‘strongly agree’ how much they agree with the statements. A 7-point scale was

chosen such that the variety of answers participants can give is large enough and measurements of

insight types are not dichotomous. Based on what the participants indicate on these statements, a

score for each insight type can be calculated, such that a visualization can be evaluated on the

insight types people can get from them.


2.5 Conclusion

The main goal of this study was to create a questionnaire with which researchers and developers can

evaluate data-visualizations on the insights people can get from them. Based on the results of this

study we reduced the number of insight types from eight to six. For each of these six insight types,

four statements were found.

We believe that the median score people give on these four statements will indicate how easily

participants can gain the corresponding insight type from a personal informatics visualization.

To check the validity of the questionnaire a second study needed to be done.


3 Study II: Comparison of the Questionnaire with the

Insight-Based Methodology

3.1 Introduction

After developing the questionnaire, a study is performed to test the questionnaire on its concurrent

validity. Therefore, we compared the questionnaire with one of the complex and time-consuming

qualitative methods described in the general introduction. As shown in this introduction, different

qualitative methods are used in the past to evaluate visualizations. The best-known method to

evaluate personal visualizations is the insight-based methodology as proposed by North (2006).

Besides, this paper of North can be considered as one of the leading paper towards measuring

insights, the insight-based methodology was also used by Choe, Lee, Zhu, Riche, and Bauer (2017) to

evaluate their visualization on the insight people could get from it. The insight-based methodology

exists of an open-ended think-aloud study, where the participants have to discover visualizations.

In this study, multiple visualizations will be evaluated with both the developed questionnaire and the

insight-based methodology as it was conducted by Choe et al. (2017). In this way the results of the

developed questionnaire can be compared with the qualitative method, both on visualization- and

on insight level.

3.2 Methods

3.2.2 Participants

Fifteen participants took part in the experiment. They were students at Eindhoven University of

Technology and registered in the database of the Human-Technology Interaction group. The

participants included five men and ten women. They were between 20 to 25 years old (mean = 22,

SD = 1.5). The experiment took 40 minutes and a compensation of €7 for participating was provided.

3.2.3 Design & Materials

A within subject design was used to compare the developed questionnaire with the insight-based

methodology. The insight types a participant could achieve from a visualization was measured with

both methods. The 24-item scale developed in study 1 was used, which gives for each of the six

insight types a score on a 7-point scale how easy a participant could gain an insight type from a

visualization. The insight-based methodology measured for the same six insight types if they could

be achieved or not. As described in the introduction, the insight-based methodology exists of a


think-aloud interview, were people had to discover data-visualizations and in the meantime had to

say at loud what they were thinking or what they were doing.

Participants first conducted the think-aloud interviews before filling in the questionnaires. In this

way, insights they achieved during the think-aloud interview were not triggered by statements they

read in the questionnaires.

Visualizations

In total six different visualizations were created, one test visualization and five visualizations used in

both methods. The test visualization was used to make the participant familiar with the think-aloud

protocol. We used the Plotly tool, in Python, to create the interactive visualizations. These

visualizations were implemented in a framework such that the visualizations could been shown via a

web application.

The visualizations were based on visualizations used by existing mobile fitness applications. As

described in the introduction, people can keep track of their sleep pattern, daily exercise, sport

activities, energy use, etc. While some of the applications measures multiple features, for example

Fitbit app, Moves app, Google Fit app. Others are more specialized in collecting and visualizing one

type of measurement, for example Runtastic, Pillow Sleep, or Nike+ app. The researcher of this

master thesis downloaded and tested multiple of these free fitness applications to get more

acquainted with the way existing applications visualize their data.

In this master thesis we will not only focus on one type of data that can be collected, but try to

develop a methodology that can evaluate visualizations from different types of self-tracked data.

Therefore, the six visualizations were all based on different types of data that can be collected. Each

visualization existed of two graphs, which were created from the same data or from related types of

measurements. The graphs in three of the six visualizations were based on one-month Fitbit data

collected by one of the supervisors. The graphs in the other three visualizations were created from

simulated data for the same month. Beside the different types of data, we also decided to use

different types of graphs that are used in existed mobile fitness applications.


The test visualization existed of a bar chart and a pie chart, which presented the daily number of

steps and distance walked of a person for one month (Figure 2). The bar chart showed these number

of steps and distance walked for every day separately. The line chart gave the sum of these variables

until a specific day. This specific day could be chosen by the participants

Figure 2. Test Visualization – Bar charts presenting a person’s daily number of steps and distance for one month, the line chart shows the total number of steps and distance (km) a person took until a specific day in

the month.


The first visualization, which was used for both methodologies, contained a stacked bar chart and

pie chart to show the duration of different types of activity of a person throughout a day (Figure 3).

The stacked bar chart showed these activity in minutes for every day of the month. The pie chart

gave the activity for one specific day in percentiles.

Figure 3. Visualization 1 – A person’s different ways of activity throughout the day in minutes given in stacked bar charts. A pie charts show the different types of activity a person had in percentiles for one specific day.


The second visualization presented the number of calories a person burned in rest and by activity

per day. Data for one month was shown with a bar chart and a scatter plot (Figure 4). The bar chart

showed for every day the calories burned by rest, by activity and in total. The scatter plot presented

the calories burned in rest vs the calories burned by activity per day.

Figure 4. Visualization 2 – showing a person’s calories burned in rest and by activity for a day for one month with a bar chart. The scatter plot shows the calories burned by activity versus the calories burned in rest for

each day.


The third visualization existed of an Gantt Chart and Gauge chart which presented objective and

subjective sleep data for one month (Figure 5). The Gant Chart presented the sleep pattern of a

person for every night of the month. The Gauge chart gave a self-indicated sleep quality score for

each night.

Figure 5. Visualization 3 – A gauge chart shows a person’s sleep pattern for every night in one month. For every night the hours of awake, light sleep, REM, and deep sleep is given. The second graph show the self-

reported sleep quality for one night.


In the fourth visualization, the first graph presented the body temperature data of one month with a

heat map. The second graph presented the Heart rate of this person for the same month with a line

chart with error bars (Figure 6).

Figure 6. Visualization 4 – The graph above shows a person’s body temperature of one month, measured three times a day is shown in a heatmap. The graph below shows the same person’s heart rate with a line chart with

error bars. The error bars indicate the maximum and minimum heart rate reached that day.


The fifth and last visualization participants had to evaluate existed of a Bubble chart and two

boxplots presenting a person’s running and cycling activities of one month (Figure 7). The Bubble

chart presented each activity on a time-scale. The boxplots presented a summary of all running and

cycling activities.

Figure 7. Visualization 5 – The graph above shows a person’s cycling and running activities of one month shown in a bubble chart. The size of the bubble indicates the speed of the activity. The graph below gives a

summary of the running and cycling activities based on speed and distance.


Questionnaires

For each of the visualizations, the questionnaire developed in the first phase of this master study

(Appendix C) was used to measure the insight types participants achieved from the visualizations. As

many of the statements in this qeustionnaire were for general data types, a modified questionnaire

was created for each of the visualizations (Appendix D).

For example for the first visualization, which presented the minutes a person was active, the

statement: “It is easy to compare two of my latest runs, based on the speed and the distance” was

tansformed to “It is easy to compare two days, based on the minutes sedentary and the minutes

active”.

3.2.4 Procedure

The experiment was conducted in the general-purpose lab of the Human-Technology Interaction

group. After arriving at the lab, the participants first read and signed the informed consent form

(Appendix E). The experiment leader shortly revised the procedure as they could read in this

informed consent to make sure the participant understood the think-aloud protocol. After explaining

the procedure, the recording was started.

Next, the participant took place behind the laptop and the test visualization was shown. Before the

participants started discovering the visualization, the following general question was asked: “What

can you learn from the data presented with this visualization and which information do you get from

this visualization.” When the experiment leader experienced that the thinking aloud goes well, he

told the participants to go to the next visualization.

Before each visualization, the same general question was asked. When the experiment leader

noticed that the participant was not thinking out loud, questions like “Why did you do this?” or

“What were you thinking?” were asked. If the participants indicated that they could not find any

new information, they could choose the next visualization to discover. After discovering all five

visualizations, the recording was stopped.

After the think-aloud interview, the participants were asked to fill in the developed questionnaire for

the five visualizations they just discovered. The questionnaires were given on paper and participants

filled them in, in the same order as they discovered the visualizations. Participants could still consult

the different visualizations when filling in these questionnaires. After filling in the questionnaires,

participants were debriefed and could ask questions about the purpose of the experiment.


3.2.5 Data Analysis

All quotes participants gave during the think-aloud interviews that could be related to one of the

eight insight types were collected. A content analysis was conducted to categorize these different

quotes to the insight types found by Choe et al. (2015). These categorizations were done by

comparing the quote with the list of quotes collected during the first phase of this thesis and with

the description of each insight type (Table 1).

During the questionnaire, participants had to fill in four statements for each insight type. The

Cronbach’s Alpha for each of the six insight types were calculated to investigate the level of internal

consistency. Furthermore, medians for the results of the four statements were calculated. In this

way one 7-point scale score for each insight type was found.

The scores for the insight types measured with the questionnaires were compared with the insights

found during the think-aloud study. We expected to find similar results with both methodologies. In

this way, the developed questionnaire could be tested on concurrent validity.


3.3 Results & Discussion

In this section the results of the second study will be given and discussed. First an overview and

description from the data of both the questionnaire and the think-aloud study is provided. Next, a

comparison is made between the insights found during these think-aloud interviews and the

questionnaires. Finally, the results on insight level over all five visualizations are discussed.

3.3.1 Determination of Insights

As described in the method section, all 15 participants filled in a questionnaire for five visualizations.

In these questionnaires, six different insight types are measured by 4 different statements. In the

previous study, we showed that a high classification rate was found for each of these statements

towards the insight types for which these statements were created. By letting participants

evaluating visualizations in this second study, we could investigate the level of internal consistency

of these four statements (Table 3).

A lower alpha value was found for insight type Detail. Which suggests that the four statements used

in the questionnaire are not measuring the same concept. In the discussion section of study 1, we

already argued that insight type Outlier be a subtype of Detail, this already indicated that insight

type Detail is a wider concept, containing multiple descriptions. The description given by Choe et al.

(2015) (Table 1) also indicates that the insight type Detail can be gained in multiple ways. For

example, by finding a specific value in the visualization, but also by understanding the legend and

the corresponding labels. However, we believe that if participants indicated in the questionnaire

that it is easy to achieve an insight type from the visualization for most of the statements, then they

could gain this insight type. Therefore, to calculate if the participant could achieve a specific insight

type, the median of the 4 different statements is taken, such that outliers will have a smaller effect

of the results.

Table 3: The Mean, Standard Deviation, and Cronbach’s alpha of the six insight types

Insight Type Mean Stand Deviation Cronbach’s alpha

Detail 6.107 1.457 0.510

Comparison 4.853 1.380 0.654

Correlation 3.723 1.551 0.728

Data Summary 4.653 1.518 0.821

Trend 4.900 1.724 0.823

Self-Reflection 4.747 1.321 0.845


The think-aloud interviews are decoded, as described in the methods section. A selection of quotes

from all participants and their corresponding insight type are given in Table 4. Similar to previous

studies, who used the insight-based methodology, insights are classified as gained or not, which

makes these variables dichotomous. We determined an insight type as gained by the participant,

when at least one quote belonging to this insight type was given during the interview.

Table 4: Examples of quotes and their corresponding Insight Type

Insight Type Quote Participant

Detail “It’s clear to see the outliers, for example the 11th of April” P4 (Vis. 2)

Detail “The number of calories is highest for the 24th.” P5 (Vis. 2)

Comparison “If you compare these results with the 18th of April, then you see that there is a lot more of deep sleep for this day.

P2 (Vis. 3)

Comparison “Big differences between averages for one day.” P5 (Vis. 4)

Correlation “You can see that if a shorter distance is covered, then the speed for this distance is bigger.”

P3 (Vis. 5)

Correlation

“There is no clear correlation between the objective and subjective measurements.”

P8 (Vis. 3)

Data Summary

“This person tracked data for one month.” P1 (Vis. 1)

Data Summary

“You can see for 30 days if you were really active or not active at al.”

P14 Vis 1)

Trend “You can see a pattern for the whole month.” P4 (Vis. 1)

Trend “You see that he is warmer in the morning over the month.” P12 (Vis. 4)

Self-reflection “In this way, this person can relate his experience with what really happened that night.”

P3 (Vis. 3)

Self-reflection “For running this person probably has a specific round, because you can’t see a scheme or increase.”

P9 (Vis. 5)

When coding the recordings and determining which insight types were found with both

methodologies, We experienced that both conducting the study and analyzing the collected data

took a lot more time with the insight-based methodology as with the questionnaire.

With the insight-based methodology, each recording had to be coded to find which insight types

were found by the participants. Note that in this study only one researcher coded the recordings of

all fifteen participants. Imagine how much more time it would cost to let multiple researchers code

the recordings. With the questionnaire we easily could calculate the median of the four statements

to find out which insights can be found in the visualizations. Based on this experience we can

conclude that the questionnaire is more time-efficient to use and the collected data is easier to

analyze.


3.3.2 Think-Aloud Interviews vs. Questionnaire

Comparison per Visualization

Beside verifying that the questionnaire is more efficient and easy to use, we also wanted to find out

if the insights found with the questionnaire for a certain visualization are equal to the insights found

with the insight-based methodology for the same visualization. Each participant evaluated five

visualizations. We argue that the graphs and data used in the visualization can have influenced the

insights that were found with both methodologies. Therefore, we decided to discuss the results for

each visualization separately. For each visualization, the results of the think-aloud methodology are

given by bar charts and the results of the questionnaire are presented with box plots.

The first visualization showed a person’s daily activity for one month with a stacked bar chart and a

pie chart. Comparing the results of both methodologies, we can conclude that in general the same

results are found for this visualization (Figure 8).

Both methods indicate that insight type Detail and Comparison can be achieved from the

visualization, which means that it is easy to find a specific value in the visualization or that it is easy

to make a comparison between different entities. Both methods also showed that the insight type

that is hardest to gain from this visualization is correlation. This means that it is hard to find a

positive or negative correlation between different variabels in the data. No agreement, could be

found for the other three insight types. Relative high medians were found with the questionnaire,

but during the think-aloud interviews around half of the participants could gain the insight type from

the visualization.


Figure 8. Results of both methods for Visualization 1

The second visualization presented the calories a person burned for each day of a month. A bar

chart and scatter plot was used to present the data. Again, with both methods the participants

indicate that it is easy to find details from the data in the visualizations. However, the results found

for the other insight types are different for both methods (Figure 9).

High median values are found for all six the insight types, which indicate that based on the results of

the questionnaire, we would conclude that participants can achieve these insight types from the

visualization.

During the think-aloud interviews, many participants did not find some of the insight types. Thus,

based on the results of the insight-based methodology, we would conclude that only insight type

Detail and Comparison can be achieved from the visualization.



Similar results were found for the third visualization and fourth visualization (Figure 9 and Figure

10), namely that the results of both methods are not in line with each other. Contrary to the results

of the second visualization, more insight types were found during the think-aloud interviews, but

with the questionnaire low median scores were given for most of these insight types.

Based on these results, we can conclude that the participants could get some insight types during

the think-aloud interviews. But when we asked them if it was easy to find these insight types, they

indicated that this was not the case.

The graphs used in these visualizations can be considered as more abstract than the graphs of the

other three visualizations. This can explain why many participants indicated in the questionnaire that

it is not easy to get those insight types. With the think-aloud interviews we only measured if they

could get the insight type or not, without considering the difficulty of finding it.


The fifth visualization presented a person’s sport activities of the last month with a bubble chart and

box plots. Like with the previous two visualizations, many participants found most of the insight

types during the think-aloud interviews, but with the questionnaires median of medians scores

between 4 and 5 were found for five of the six insight types. Furthermore, the variation of the

medians score differs a lot (Figure 11).

For the previous two visualizations, the differences between the two methods were explained based

on the graphs used in the visualizations. We believe that also the type of data that is presented can

explain some differences between the two methodologies. In this fifth visualization, a person’s

running and cycling activities are presented. This type of data can be more familiar for a participant

and consequently a participant can find it easier to discover the graph. This was also indicated by

some of the participants during the think-aloud interviews. Participant 4, for example, said: “When I

cycled faster, I did more kilometers, which make sense” and also participant 10 said: “That is quite

plausible, if you go cycling you can do longer pieces, but when you run you often have your optimum,

so you cannot go faster than a certain speed.”



Comparison per Insight Type

In the previous section, we discussed per visualizations which insight types were found with both

methods or only one method. The results showed that these insight types where different per

visualizations and could be dependent on the graphs or data used in the visualizations. Another goal

of this study was to find out the differences and similarities between the two methods on insight

level. An overview of the number of times a specific insight type is found and the medians of

medians for each insight type can be found in Appendix F.

Unambiguously results for both methodologies over all five visualizations were only found for one

insight type, namely Detail. Based on the description given by Choe et al. (2015) for this insight type

(table 1), we conclude that people can state the identities of data points with extreme values,

specify the measured value or its range, and find the values of categorical variables for all five

visualizations. Furthermore, the results for this insight type also indicate that people can already

gain the insight type Detail from data-visualizations used by mobile applications nowadays, because

the evaluated visualizations are all based on existing data-visualizations. The high number of times

some details were found by the participants are in line with results found by Choe et al. (2017). They

used the think-aloud method to let 11 participants evaluate one user-interface, containing of

multiple data-visualizations and count the number of times an insight type was gained. Their results

also showed that the insight type Detail could be easily found by the participants.

Insight type Comparison was found during almost all interviews, but people did not always indicate

with the questionnaire that it is easy to make a comparison of different variables of values. One

explanation that many participants found this insight type during the interviews is that a simple

comparison of two values, entities, or graphs was already coded as Comparison. Participants 2, for

example, said during the think-aloud interview of visualization 4: “A short look to the different

evenings to see if there is a difference in body temperature.” This quote is coded as Comparison and

thus with the insight-based methodology we conclude that participants can achieve insight type

Comparison from this visualization. In the questionnaire for visualization 4, participants 2 gave a

median value of 4.5 on 7 for insight type Comparison, which indicates that the participant somewhat

agree that it was easy to get this insight type from the visualization.

Based on the results found in this study, we conclude that people are almost always able to make a

comparison between different variables and values in a visualization, but this does not necessarily

mean that they find it easy to make this comparison. Knowing if a participant experienced the

comparison of different entities as easy can be considered as a useful advantage of the developed

questionnaire.


No unambiguous results were found for the other four insight types for both methodologies over all

five visualizations. The results show that for one visualization more insight types were gained with

one method and for another visualization higher scores were found with the other method. No clear

reason can be given why these results were found. Again, a possible explanation can be given based

on the different ways of measurement of the two methods. During the think-aloud interviews,

participants had to explore the visualizations without any triggering towards finding an insight type.

Contrary, in the questionnaires we asked towards how easy a specific insight type can be achieved

from the visualization. It could be the case that a participant was not able to achieve an insight type

during the think-aloud interviews, but discovered that it is easy to get this insight type from the

visualization when reading the statements from the questionnaire.

Epstein et al. (2016) showed that users stop tracking, because the visualizations did not give or no

show the information the user expected to receive. This indicates that users have expectations

before starting to discover the visualizations. These expectations can be related to those insight

types that participants did not discovered during the think-aloud interviews in this study. Therefore,

we argue that it is important to consider all possible insight types when evaluating data-

visualizations and not only those that participants will come up with during discovering these

visualizations.

3.4 Conclusion

The main goal of this second study was to compare results found with the developed questionnaire

with results of a qualitative method, both on visualization and insight level. We expected to find

similar results with the questionnaire as with the think-aloud interviews used in this insight-based

methodology. Based on the results, we can conclude that this, in general, is not the case. Different

insight types were found with the questionnaire as with the think-aloud method for four of the five

visualizations. However, some different patterns were found between the results of the two

methods.

With the analysis on visualization level, we showed that differences between the two methods can

be explained based on the data that is presented or the abstraction level of the graphs used in the

visualizations. Furthermore, the analysis on insight level showed that differences between insights

can also be explained by the different ways the methods measure if an insight type can be gained

from the visualizations. In this analysis on insight level, we also gave the advantage for the use of the

questionnaire for evaluating data-visualizations.


4 General Discussion

The main of aim of this master thesis was to create a methodology to evaluate personal informatics

visualizations on insights in a time-efficient and easy way. In the first study, we successfully created

this methodology in the form of a questionnaire. In the second study, we experienced that the

questionnaire is a time-efficient and easy methodology to evaluate visualizations.

A second goal was to find out to which extent the insights found with the developed questionnaire

can be compared with insights found with a qualitative method. The results of the second study

showed that there are differences between the results of both methods. Some of these differences

can be explained based on the different ways insight types are measured with the developed

questionnaire and the insight-based methodology.

The main difference is that with the developed questionnaire, each participant evaluated all six

insight types. Contrary, with the insight-based methodology participants had to discover the

different insight types without any guiding towards these insight types, which make it possible that a

participant could not gain a specific insight type, just because he did not think of this way of

discovering the visualization.

Another difference, related to this main difference, can be found in the way the methods decide if

an insight type could be gained or not. Where the think-aloud interview only consider if an insight

type could be gained or not, the questionnaire focused on how easy a participant finds it to find an

insight type.

Based on these differences between the methods and the different results we found with both

methods, we were not able to prove the concurrent validity of the questionnaire.

However, we showed in the first study that the statements used in the questionnaire are related to

the six insight types. Furthermore, we showed in the second study that with the questionnaire we

cannot only measure if participants could find an insight type, but also if they found it easy to

achieve this insight type. This is an advantage of the questionnaire over the insight-based

methodology.

Based on this advantage and the fact that the developed questionnaire is more time-efficient and

easier to use, we argue that despite of the differences found with both methods, the developed

questionnaire can potentially be used for evaluating visualizations.

Beside the clear advantages of using the questionnaire to evaluate personal informatics

visualizations, we also found some results we did not expected.

In the first study, participants had to classify 55 statements towards one of the eight insight types.


An average classification rate of 54.76 (SD=10.66) was found. Which means that around half of the

statements were classified towards the insight type for which the statement was created. However,

all 55 statements were based on quotes which were coded towards these insight types in previous

studies. These results indicate that with the qualitative methods often quotes are categorized by a

researcher to an insight type, which could be coded to a different insight type by other researchers.

It is not implausible that the same bias occurred during the coding of the think-aloud interviews in

the second study. This, again, showed the complexity of the qualitative methods to evaluate

visualizations on insights.

The insight type ‘self-reflection’ was only found in 40 of the 75 think-aloud interviews (Appendix F).

This is little compared to the results found by Choe et al. (2017). They also evaluated a visualization

with the insight-based methodology and in their study ‘self-reflection’ was the insight type that was

found the second most.

To give the participants the possibility to self-reflect on the data, the questions related to self-

reflection started with the sentence: “Imagine that the presented data is yours…”. Also during the

think-aloud interview, the researcher asked the question: “Imagine that this data is yours, what can

you learn from this data?”. Nevertheless, it can still be difficult to reflect on data that is not yours,

therefore we also coded quotes where the participants reflected in name of the person as self-

reflection. Example quotes are: “there are no measurements for these two days, I think he was not

wearing his device (P10, visualization 1)” and “Probably he realized that he was really lazy and

decided to do something (P9, visualization 2)”.

This illustrates that even when asking the participants to imagine that the presented data is theirs, it

is still difficult to evaluate visualization in this way. This is in line with findings of previous studies

(Thudt, Lee, Choe, & Carpendale, 2017).


5 Limitations

Several limitations arose during the two studies. In this section we address these limitations such

that researcher can take them into account in further research.

In the first study, participants had to fill in an online survey. After sharing this survey with fellow

students, we noticed that many participants stopped after getting the descriptions of the different

insight types (table 1). This can indicate that these participants could not understand these

descriptions. Because many participants did fill in the questionnaire at that point and we did not

want to influence the results of this study, we decided to not make any modification to the survey.

A second limitation is related to the design of the visualizations. Despite that the visualizations were

based on existing visualizations, there were some design aspects that could hinder participants to

gain insights. The created data-visualizations were given on a laptop, while the existing visualizations

all came from mobile fitness applications. The interactivity between visualizations on a mobile phone

can differ from the interactivity the participants had on the laptop. P6 for example said: “I recognize

this visualization from my Fitbit, but normally you can easily zoom in.”

A third limitation of the second study is the skills of the participants. All participants were students

at Eindhoven university of technology (TU/e) and registered in the database of the Human-

Technology group. Students at a technological university have some basic knowledge in statistics.

Therefore, it is possible that they were also better in comprehending the graphs and in getting

knowledge from the data-visualizations. This can explain the high number of insights that were

gained by the participants. As described in the introduction, nowadays everyone can start tracking

their data, which means that self-trackers without any experience with graphs should be able to get

insights from the visualizations as well.

A fourth limitation is also related to the participants, participating in the second study. During the

think-aloud interview, the researcher noticed that some of the participants were wearing a smart

watch. These participants could be more experienced with the data-visualizations than the other

participants. Participants 13, for example, said during the interview: “I recognize this from my Fitbit.”

Unfortunately, we did not ask the participants during the study if they had any experience with self-

tracking. The results do not indicate that these participants had an influence on the results we

found. Nevertheless, more research need to be done to see if there is a relation between the

experience of the self-tracker and the way he evaluates personal informatics visualizations.


6 Future Research

In the introduction, we explained that people stop tracking, because they are not able to get the

right insights from the visualizations. However, the results of the second study illustrated that

existing visualizations already contain most of these insight types. This indicates that it is not only

important to know which insight types people can get from the visualizations, but also which types

they want to get from it. Questions that need to be answered in future research are: “which insight

types are most important for self-trackers?” and “Are certain insight types more important for one

type of visualizations and data measurements than for another?”.

The results of the second study showed that the graphs used in a visualization and the data

presented in these visualizations influences the differences and similarities between the two

methodologies. In his master thesis, fellow student René Van Knippenberg investigated the relation

between abstraction level of graphs and the insights people can find. In his study, he also used the

insight-based methodology to measure the insight types participants found. Further research needs

to be done to investigate the same relation between graphs and insight types, but measured with

the developed questionnaire.

Some insights were found by every participant, where for other insights only half of the participants

indicated that they could get these insights. Further research can be done towards these individual

differences. Are there different groups of self-trackers? Or which factors can explain the differences

between individuals? Till now, few research is done towards different types of self-trackers. In her

master thesis, Christina Vestergaard investigated the influence of personality types on acceptance

and adoptance of self-tracking devices. This a first step towards categorizing self-trackers in different

groups.

There were not only Individual differences between the insights participants found, but also

between the insights a participant found with one method compared to the other method. One

reason, we did not mention yet, can be that some participants did not feel confident during the

think-aloud interview or had difficulties to talk at loud about everything they did and thought. In this

study, we gave the participants the chance to practice the method. However, it still possible that,

even after practicing, some participants are more fluent in it than others. Further research can be

done towards the relation between how confident people are during the think-aloud interview and

the insights people can get.


7 Conclusion

In this master thesis we presented a questionnaire that can be used to evaluate personal informatics

visualizations. This questionnaire will make it easier for researchers and developers to evaluate their

data visualizations on the insights people can get from, before they are presented to the user.

We showed that based on the growing field of the quantified-self domain and the complexity of

current evaluating methods, there was a need for a new time-efficient and easy tool to evaluate

personal informatics visualizations.

This tool was created in the first study of this master thesis in the form of questionnaire. The results

of this study showed that the statements used in this questionnaire are strongly related to the

different insight types. Based on the way these statements were created, we conclude that they can

be used to evaluate visualizations.

The comparison of this questionnaire with a qualitative method, performed in the second study,

showed the similarities and differences of our tool with this method. Furthermore, the results

showed some advantages of the questionnaire. Not only the availability of an insight type can be

measured, but also how easy it is for the user to get this insight type from the visualization.

Based on the results found in these two studies, we conclude that the developed questionnaire can

potentially be used as an easy an efficient way to evaluate personal informatics visualizations.


References

About the Quantified Self. (2016, November 01). Retrieved April 01, 2018, from

http://quantifiedself.com/about/

Choe, E. K., Lee, N. B., Lee, B., Pratt, W., & Kientz, J. A. (2014). Understanding quantified-

selfers' practices in collecting and exploring personal data. In Proceedings of the 32nd annual ACM

conference on Human factors in computing systems, 1143-1152.

Choe, E. K., Lee, B., Zhu, H., Riche, N. H., & Baur, D. (2017). Understanding Self-Reflection:

How People Reflect on Personal Data through Visual Data Exploration. In Proceedings of the 11th EAI

International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth’17).

ACM, New York, NY, USA, 10.

Card, S. K., Mackinlay, J. D., & Shneiderman, B. (Eds.). (1999). Readings in information

visualization: using vision to think. Morgan Kaufmann.

Chen, C. (2010). Information visualization. Wiley Interdisciplinary Reviews: Computational

Statistics, 2(4), 387-403.

Choe, E. K., Lee, B., Schraefel, M. C. (2015). Characterizing visualization insights from

quantified selfers' personal data presentations. IEEE computer graphics and applications, 35(4), 28-

37.

Collins, E. I., Cox, A. L., Bird, J., & Cornish-Tresstail, C. (2014, December). Barriers to

engagement with a personal informatics productivity tool. In Proceedings of the 26th Australian

Computer-Human Interaction Conference on Designing Futures: the Future of Design (pp. 370-379).

ACM.

Consolvo, S., McDonald, D. W., & Landay, J. A. (2009, April). Theory-driven design strategies

for technologies that support behavior change in everyday life. In Proceedings of the SIGCHI

conference on human factors in computing systems (pp. 405-414). ACM.

Cuttone, A., Lehmann, S., & Larsen, J. E. (2013). A mobile personal informatics system with

interactive visualizations of mobility and social interactions. In Proceedings of the 1st ACM

international workshop on Personal data meets distributed multimedia (pp. 27-30). ACM.

Ellis, G., & Mansmann, F. (2010). Mastering the information age solving problems with visual

analytics. In Eurographics (Vol. 2).

Epstein, D. A., Caraway, M., Johnston, C., Ping, A., Fogarty, J., & Munson, S. A. (2016).

Beyond Abandonment to Next Steps: Understanding and Designing for Life after Personal

Informatics Tool Use. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

. CHI Conference, 2016, 1109–1113. http://doi.org/10.1145/2858036.2858045

Epstein, D. A., Ping, A., Fogarty, J. and Munson, S. (2015). A lived informatics model of

personal informatics. Proceedings of the 2015 ACM International Joint Conference on Pervasive and

Ubiquitous Computing - UbiComp '15, 731-741.

http://quantifiedself.com/about/


Etkin, J. (2016). The Hidden Cost of Personal Quantification. Journal of Consumer Research,

Volume 42(6), 967–984. https://doi.org/10.1093/jcr/ucv095

Fan, C., Forlizzi, J., & Dey, A. K. (2012, September). A spark of activity: exploring informative

art as visualization for physical activity. In Proceedings of the 2012 ACM Conference on Ubiquitous

Computing (pp. 81-84). ACM.

Few, S., & Edge, P. (2008). What ordinary people need most from information visualization

today. Perceptual Edge: Visual Business Intelligence Newsletter, 1-7.

Flood, J. (2013). Wearable Computing Devices, ABI Research. Retrieved from

https://www.abiresearch.com/press/wearable-computing-devices-like-apples-iwatch-will/

Forsell, C., & Cooper, M. (2012, October). Questionnaires for evaluation in information

visualization. In Proceedings of the 2012 BELIV Workshop: Beyond Time and Errors-Novel Evaluation

Methods for Visualization (p. 16). ACM.

Govaerts, S., Verbert, K., Duval, E., & Pardo, A. (2012, May). The student activity meter for

awareness and self-reflection. In CHI'12 Extended Abstracts on Human Factors in Computing

Systems (pp. 869-884). ACM.

Hogan, T., Hinrichs, U., & Hornecker, E. (2016). The Elicitation Interview Technique:

Capturing People's Experiences of Data Representations. IEEE transactions on visualization and

computer graphics, 22(12), 2579-2593.

Kay, M., Choe, E. K., Shepherd, J., Greenstein, B., Watson, N., Consolvo, S., & Kientz, J. A.

(2012, September). Lullaby: a capture & access system for understanding the sleep environment.

In Proceedings of the 2012 ACM conference on ubiquitous computing (pp. 226-234). ACM.

Keim, D. A. (2002). Information visualization and visual data mining. IEEE transactions on

Visualization and Computer Graphics, 8(1), 1-8.

Keim, E. D., Kohlhammer, J., & Ellis, G. (2010). Mastering the Information Age: Solving

Problems with Visual Analytics, Eurographics Association.

Kerren, A., Plaisant, C., & Stasko, J. T. (2011). Information visualization: State of the field and

new research directions. Information Visualization, 10(4), 269.

Kerren, A., Stasko, J., Fekete, J. D., & North, C. (Eds.). (2008). Information Visualization:

Human-Centered Issues and Perspectives (Vol. 4950). Springer.

Kersten-van Dijk, E. T., Westerink, J. H., Beute, F., & IJsselsteijn, W. A. (2017). Personal

informatics, self-insight, and behavior change: A critical review of current literature. Human–

Computer Interaction, 32(5-6), 268-296.

Khot, R. A., Lee, J., Aggarwal, D., Hjorth, L., & Mueller, F. F. (2015). Tastybeats: Designing

palatable representations of physical activity. In Proceedings of the 33rd Annual ACM Conference on

Human Factors in Computing Systems (pp. 2933-2942). ACM.


Khot, R. A., Pennings, R., & Mueller, F. F. (2015). EdiPulse: supporting physical activity with

chocolate printed messages. In Proceedings of the 33rd Annual ACM Conference Extended Abstracts

on Human Factors in Computing Systems (pp. 1391-1396). ACM.

Li, I., Dey, A. & Forlizzi, J. (2010). A stage-based model of personal informatics systems. Proc.

CHI '10, 557-566.

Li, I., Dey, A. K., & Forlizzi, J. (2011). Understanding my data, myself: supporting self-

reflection with ubicomp technologies. In Proceedings of the 13th international conference on

Ubiquitous computing, 405-414.

Marcengo, A., & Rapp, A. (2014). Visualization of human behavior data: the quantified self.

Innovative approaches of data visualization and visual analytics, 1, 236-265.

Moere, A. V. (2008, July). Beyond the tyranny of the pixel: Exploring the physicality of

information visualization. In Information Visualisation, 2008. IV'08. 12th International

Conference (pp. 469-474). IEEE.

North, C. (2006). Toward measuring visualization insight. IEEE computer graphics and

applications, 26(3), 6-9.

Plaisant, C., Fekete, J. D., & Grinstein, G. (2008). Promoting insight-based evaluation of

visualizations: From contest to benchmark repository. IEEE transactions on visualization and

computer graphics, 14(1), 120-134.

Robertson, G., Czerwinski, M., Fisher, D., & Lee, B. (2009). Selected human factors issues in

information visualization. Reviews of human factors and ergonomics, 5(1), 41-81.

Saraiya, P., North, C., & Duca, K. (2005). An insight-based methodology for evaluating

bioinformatics visualizations. IEEE transactions on visualization and computer graphics, 11(4), 443-

456.

Rooksby, J., Rost, M., Morrison, A., & Chalmers, M.C. (2014). Personal tracking as lived

informatics. Proc. CHI’14, 1163-1172. https://doi.org/10.1145/2556288.2557039

Sharmin, M., & Bailey, B. P. (2013, June). ReflectionSpace: an interactive visualization tool

for supporting reflection-on-action in design. In Proceedings of the 9th ACM Conference on Creativity

& Cognition (pp. 83-92). ACM.

Smuc, M., Mayr, E., Lammarsch, T., Bertone, A., Aigner, W., Risku, H., & Miksch, S. (2008,

November). Visualizations at First Sight: Do Insights Require Training?. In Symposium of the Austrian

HCI and Usability Engineering Group (pp. 261-280). Springer, Berlin, Heidelberg.

Statista. (2017, February). Number of connected wearable devices worldwide from 2016 to

2021 (in millions). Retrieved from https://www.statista.com/statistics/487291/global-connected-

wearable-devices/


Stusak, S., Tabard, A., Sauka, F., Khot, R. A., & Butz, A. (2014). Activity sculptures: Exploring

the impact of physical visualizations on running activity. IEEE Transactions on Visualization and

Computer Graphics, 20(12), 2201-2210.

Thudt, A., Lee, B., Choe, E. K., & Carpendale, S. (2017). Expanding research methods for a

realistic understanding of personal visualization. IEEE computer graphics and applications, 37(2), 12-

18.

Verdezoto, N., & Grönvall, E. (2016). On preventive blood pressure self-monitoring at

home. Cognition, Technology & Work, 18(2), 267-285.

Yang, H., Li, Y., & Zhou, M. X. (2014). Understand users’ comprehension and preferences for

composing information visualizations. ACM Transactions on Computer-Human Interaction

(TOCHI), 21(1), 6.

Zhao, O. J., Ng, T., & Cosley, D. (2012, February). No forests without trees: particulars and

patterns in visualizing personal communication. In Proceedings of the 2012 iConference (pp. 25-32).

ACM.


Appendices

Appendix A The 55 statements participants had to classify to one of the eight insight types.

Table A1 shows the 55 statements that were created based on the quotes found in previous

literature and the descriptions of the insight types given by Choe et al. (2015) (Table 2). The

statements are ordered per insight type.

Table A1: Statements participants had to classify

Insight Type Statement

Detail “It is easy to find the exact running distance”

Detail "it is easy to get the difference between my highest and lowest heart rate today.”

Detail "It is easy to check the highest number of steps of last week”

Detail “The legends, next to it, are helping to better understand the different colors used."

Detail “It is easy to find the current heart rate (bpm).”

Detail “It is easy to explain the meaning of the different colors.”

Detail “At the end of the day, it clearly shows which different types of physical activities were done that day.”

Detail “It is easy to find the highest and lowest heart rate measured today”

Comparison “It is easy to find the days of the week with more physical activities than the rest of the week.”

Comparison “The combination of both the sleep data and the physical activity data gives a good explanation of what the relation is between them.”

Comparison “Knowing the average hours of sleep people have, it gives the opportunity to quickly see what the relation towards this average is.”

Comparison "it is easy to see the differences and similarities in sleep between two specific nights.”

Comparison “It is easy to compare the happiness level with the number of hours slept.”

Comparison "it is easy to compare two of my latest runs, based on the speed and the distance.”

Comparison “It is easy to compare different days of the week”

Comparison "It is easy to find out what this physical level means in relation to other people".”

Correlation “It gives me the opportunity to find positive or negative relations between different measurements.”

Correlation “It is easy to see that sleep gets poor when stress is higher.”

Correlation “A negative relation between two different data types can easily be found in this visualization.”

Correlation “From the data, it is easy to tell what the relationship between the hours of sleep and the number of steps is.”

Correlation “If there is a relation between hours of sleep and number of steps, it would be


easy to find this relation.”

Correlation "It is easy to compare different measurements and the relation between these measurements.”

Data Summary “It is easy find the total number of runs done.”

Data Summary “It is easy to quickly find the average distance and speed over all tracked running activities. “

Data Summary “It is easy to check the total number of data points.”

Data Summary "It is easy to get a clear overview of all the collected data based on time and number of data points.”

Data Summary “The data is summarized in a nice way.”

Data Summary “It is easy to see for how long that data is tracked.”

Data Summary “A good overview of all the measured data from beginning till now is given.”

Trend “It is easy to see if there is a tendency in the number of steps daily taken.”

Trend “it gives the opportunity to check in which way the resting heart rate changed over the last two years.”

Trend "it gives me a clear overview of the different waking up time throughout the week.”

Trend “It is easy to find recurring tendencies in the data.”

Trend “It is easy to find if the heart rate in rest is increased or decreased over a longer period of time.”

Trend “It is easy to describe changes over a longer time.”

Self-reflection “There is a lot of contradiction between what is shown and the things I did today.”

Self-reflection “The collected data is in line with my expectations, based on what occurred today…”

Self-reflection “It is easy to make predictions based on this data.”

Self-reflection “It is in line with what was already known about my sleeping behavior.”

Self-reflection “Good predictions of how my physical activity would evolve are made.”

Self-reflection “It is possible to explain the atypical values in the data, based on things that occurred that day. ”

Self-reflection “It is possible to explain the data, knowing that the person wasn’t in an everyday affair, like for example an injury.”

Self-reflection “Based on what I did today, I didn’t expect to find this.”

Distribution "It is easy to see how much percentage of the time is spent on sleeping, walking, studying…”

Distribution it’s easy to find out if the physical activity level is different for different periods in time.”

Distribution It is easy to tell during which parts of a day the heart rate is variating more.”

Distribution “It is possible to explain how one day is divided in sleeping, working, studying, leisure time, and physical activity.”

Distribution "It is shown in a clear way how much of the time the person is sitting, running, walking, or doing other activity.”

Distribution "It is easy to check in which way the physical activities are spread on a monthly basis.”


Outlier “It is easy to find those days that the person was on a city-trip and was walking all day.”

Outlier “It is possible to find those nights that the person didn’t sleep last month.”

Outlier "it is easy to identify data points that are totally different from all other points.”

Outlier “It is easy to find measurements that are not in line with similar measures before and after.”

Outlier “it is easy to find the days the person had to do a job interview, because the average stress level is a lot higher those days.”

Outlier “It is easy to see that there are some measurements which are totally different from the other measurements.”


Appendix B

Informed consent used in the first study of this master thesis

Informed consent form

This document gives you information about the study “Getting Insights from Quantified Self Visualizations”. Before the study begins, it is important that you learn about the procedure followed in this study and that you give your informed consent for voluntary participation. Please read this document carefully.

Aim and Procedure of the study

The aim of this study is to find out if an initial pool of items is assessing the terms they are created

for. This information is used to further develop a methodology for evaluating quantified-self data

visualizations. This study is performed by Karsten Paulussen, a student under the supervision of

dr.ir. Peter Ruijten and Els Kersten – Van Dijk MSc of the Human-Technology Interaction group.

In this survey, you will get 55 statements. For each statement you should indicate to which insight it refers best, so to which insight you think the statement is related. You always have to choose between 8 different insights. On the next page short definitions of the insights are given. During the survey you can always reread this definitions.

Duration

The study will last approximately 10-15 minutes.

Voluntary & Confidentiality

Your participation is completely voluntary. You can refuse to participate without giving any reasons

and you can stop your participation at any time during the study. You can also withdraw your

permission to use your data up to 24 hours after the study is finished.

All research conducted at the Human-Technology Interaction Group adheres to the Code of Ethics of the NIP (Nederlands Instituut voor Psychologen – Dutch Institute for Psychologists).

The information that we collect from this study is used for writing scientific publications and will only be reported at group level. It will be completely anonymous and it cannot be traced back to you.

Further information

If you want more information about this study you can ask Karsten Paulussen (contact email:

[email protected]).

If you have any complaints about this study, please contact the supervisor, Peter Ruijten ([email protected]).

Certificate of Consent

By clicking the 'next' button, you agree that you have read and understood this consent form and

you give your informed consent for voluntary participation in this research.


Appendix C

Final Questionnaire created in the first study of the Master Thesis

As described in the discussion section of the first study, the final questionnaire exists of 24

statements. Four each statement, participants have to indicate on a 7-point scale if they agree with

the statement or not. In Figure C1 the statements are ordered per insight type.

Figure C1. The final questionnaire, six statements are included for each insight type. Following statements belong to the following insight types: 1-4: ‘detail’, 5-8: ‘Correlation’, 9-12: ‘Data Summary’, 13-16: ‘Comparison’, 17-20: ‘Trend’, and 21-24: ‘Self-reflection’.


Appendix D

Modified Questionnaire for each of the five visualizations

Figure B1 gave the final questionnaire created in the first study of this master thesis. To use this

questionnaire for evaluating the visualizations in the second study, we had to create a modified

version for each visualization (Figure D1 – D5).

Three different types of modification needed to be done to make the questionnaire applicable for

evaluating a visualization.

Firstly, the variables used in the statements are changed to variables that are shown in the

visualizations. For example, for the first visualization, the statement “it is easy to compare the

happiness level with the number of hours slept” is changed to “It is easy to compare the minutes

lightly active with the minutes very active.”

Secondly, the visualizations presented in the second study were created from one-month Fitbit data

from one of the supervisors or simulated. This means that participants had to self-reflect on data,

they did not collect themselves. Therefore, the statements corresponding to the insight type self-

reflection started with the sentence: “Imagine the data presented in the visualizations is yours: …” In

this way, we try to give the participants the possibility to self-reflect on this data.

Thirdly, the statements are presented in a random order.


Figure D1. Modified Questionnaire to evaluate Visualization 1.


Appendix E

Informed Consent used in the second study of this master thesis

Informed consent form

This document gives you information about the study “Discovering data-visualizations”. Before the study begins, it is important that you learn about the procedure followed in this study and that you give your informed consent for voluntary participation. Please read this document carefully.

Aim and benefit of the study

The aim of this study is to measure the information you get and the knowledge you gain from

quantified-self visualizations. These visualizations are made from data people can collect

themselves, like physical activity, sleep patterns, heart rate, etc. This information is used further

develop a methodology to evaluate quantified-self visualizations on the information and knowledge

they offer.

This study is performed by Karsten Paulussen, a student under the supervision of dr. E.T. Kersten – van Dijk of the Human-Technology Interaction group.

Procedure

During this study, you will get to see and evaluate 6 different graphs. For each of these graphs, we

ask you to freely explore the graph and think-aloud about what you can learn about it and the

information you get form the graph. The researcher will sit next to you and can ask extra questions

based on what you learn from the graph. When you think you told everything about a graph, you can

indicate this to the researcher and a next graph is shown. Audio recordings are made during this

study. After you explored all graphs, you will complete a few questionnaires, in which you will get

some statements you will rate on a 7-item scale. After filling in these questionnaires, the study ends

and you are free to ask questions.

Risks

The study does not involve any risks or detrimental side effects.

Duration

The study will last approximately 40 minutes.

Participants

You were selected because you were registered as participant in the participant database of the

Human Technology Interaction group of the Eindhoven University of Technology.

Voluntary Your participation is completely voluntary. You can refuse to participate without giving any reasons

and you can stop your participation at any time during the study. You can also withdraw your

permission to use your data up to 24 hours after the study is finished. All this will have no negative

consequences whatsoever.


Compensation

You will be paid 7 euros (€2.00 extra if you do not study or work at the TU/e or Fontys Eindhoven).

Confidentiality

All research conducted at the Human-Technology Interaction Group adheres to the Code of Ethics of

the NIP (Nederlands Instituut voor Psychologen – Dutch Institute for Psychologists).

We will not be sharing personal information about you to anyone outside of the research team. The

audio recordings that are made, will not be distributed and will not be played back in the presence

of persons other than the researchers. The material will be used only for scientific analysis. The

information that we collect from this study is used for writing scientific publications and will only be

reported at group level. It will be completely anonymous and it cannot be traced back to you.

Further information

If you want more information about this study you can ask Karsten Paulussen (contact email:

[email protected]

If you have any complaints about this study, please contact the supervisor, dr. Els Kersten – van Dijk

(contact email: [email protected]).

Certificate of Consent

I, (NAME)……………………………………….. have read and understood this consent form and have been

given the opportunity to ask questions. I agree to voluntarily participate in this study carried out by

the research group Human Technology Interaction of the Eindhoven University of Technology.

Participant’s Signature Date


Appendix F

An overview of the number of times a specific insight type is found and the medians of medians for

each insight type for all five visualizations (Table F1 and Table F2).

Table F1: Frequency of insight types for think-aloud interviews

Frequency

Type Vis. 1 Vis. 2 Vis. 3 Vis. 4 Vis. 5 Total Frequency

Detail 15 15 15 13 14 72

Comparison 15 13 15 14 15 72

Correlation 0 10 12 11 13 46

Data Summary 9 7 9 11 13 49

Trend 8 7 3 14 4 36

Self-reflection 8 5 11 8 8 40

Table F2: Median of Medians with IQR for every insight type from Questionnaire

Median of medians (IQR)

Type Vis. 1 Vis. 2 Vis. 3 Vis. 4 Vis. 5

Detail 6.5 (1.0) 6.5 (0.5) 5.5 (2.0) 6.0 (1.0) 6.5 (1.5)

Comparison 6.0 (1.5) 5.5 (1.5) 4.0 (3.5) 4.5 (1.5) 5.0 (1.0)

Correlation 3.5 (3.0) 5.0 (1.0) 3.0 (2.5) 3.5 (1.5) 4.0 (1.5)

Data Summary 6.0 (1.0) 5.5 (1.5) 4.0 (3.5) 5.0 (2.5) 5.0 (2.5)

Trend 5.5 (2.0) 6.0 (1.0) 2.5 (1.0) 6.0 (1.5) 5.0 (2.5)

Self-reflection 5.5 (1.0) 5.5 (1.0) 4.0 (2.0) 4.5 (1.5) 5.0 (2.5)