29-11-2016
1
Discover the world at Leiden University
+ cases in the health domain Wessel Kraaij
Big data analytics
Discover the world at Leiden University
Privacy?
• Big five: • Openness
• Conscientiousness
• Extraversion
• Agreeableness
• Neuroticism
• Who wants to display his big five profile online?
29-11-2016
2
3
Personality profile (PIP)
Openness
Conscientiousness Extraversion
Agreeableness Neuroticism
Example predictive analytics: Facebook personality study
4
29-11-2016
3
Evaluation of the model
6
7
29-11-2016
4
8
Nationale Wetenschaps Agenda
• Nationale wetenschapsagenda: 25 van de 140 clustervragen zijn gerelateerd aan Big Data
• Big Data Verantwoord Gebruiken: Zoeken naar Patronen in Grote Gegevensbestanden
• https://vragen.wetenschapsagenda.nl/
9
29-11-2016
5
Big data & analytics
10
Examples of big data
The ‘traditional’ internet (45 billion indexed pages) Google contributed Hadoop, Google file system
Amazon popularized cheap cloud services
Imagery data Digital video (Youtube adds 6000hrs/hr=13TB/hr)
Satellite data, Astronomical data (LOFAR) 140 PB/day
Measurement data Genome sequences, weather, air quality
Human sensors
10 million smartphones in the Netherlands, always on.
11
Capturing and storing data is a commodity. Now the real challenge is combining datasets and SENSE MAKING
29-11-2016
6
Making Sense of Big Data requires dealing with..
Heterogeneity : combining different information sources often
mentioned as game changer (variety)
Information quality (noisy data, uncertain data, missing data,
provenance, veracity, traceability, auditing)
the large volume of data
the speed in which data becomes available (velocity)
Dealing with the complexity of the system
From raw data to information :
Analyze, understand, reason, decide, act
12
Extracting meaning
13 images: Rose Business Technologies, mskcc.org, searchengineland.com, rtvdrenthe.nl, onweer-online.nl
29-11-2016
7
The different kinds of data analytics (Gartner, 2013) Descriptive analytics
Summarize and organize data into actionable information
Ex: transcribing speech, semantic annotation of images, video, measurements
Diagnostic analytics Causal inference
What caused a certain observation?
Predictive analytics 1. Predicting preferences by looking at similar people (using correlations)
2. Predicting what will happen (states, gradients) using historical data
Prescriptive analytics Suggest quantified decision options making use of the predictions and
associated cost models
E.g. Health care, smart factories etc.
14
Limitations: Societal impact
Transparency:
in the future we can trace the origin of our food,
but also: car insurer can detect risk prone driving style
Personalization:
personalized learning, personalized health care etc.
but also risk of abuse of information position
Society should define limits
Ex: Google Glass is not successful due to perceived privacy breaches
Challenge: design architecture with personalized control of trade-off privacy-utility
15
29-11-2016
8
Limitations
Data protection directive
GDPR
Netherlands
Wet op de persoonsbescherming
Key principle:
Data is collected for a specific purpose, cannot be used for other purposes without explicit consent
Limits combination of datasets
16
Limitations: spurious correlations
17
29-11-2016
9
Correlation does not imply causation [ex. thesis T. Claassen]
18
Cannabis
Depression
+
Cannabis Depression
DNA/Brain
+
Depression
Cannabis
+
Only in the first situation, banning Cannabis will help reducing
depression
STUDY EXAMPLE: SWELL PROJECT
19
29-11-2016
10
Project focus: help alter bad habits of knowledge workers and improve creativity, effectiveness
SWELL, one of the COMMIT/ projects 2011-2017
SWELL ambition
Support people to
Manage their work
Manage their health
Improve their well-being
How: Use sensors and ICT for:
1.Observe
2. Interpret
3. Act
29-11-2016
11
This is not a new idea
1924-1932: Experiments at Hawthorne works
Can workers become more productive at lower or higher levels of
light?
Outcome: effect on productivity was positive, but short-lived
>> Hawthorne effect: <<
Cf. Chris Anderson’s TEDx talk “Living by Numbers” http://vimeo.com/26182608
Microsoft Office: Clippy
Result of pMS roject on using Bayesian techniques to infer user goals
and provide feedback (PI Eric Horvitz)
Some reasons for failing:
No persistent user profiles
No adaptation to user compentence level
Reasoning based on very little data
Distinct un-connected activity and content models
Development team replaced utility function based interaction
algorithm by simplistic rule based engine.
http://robotzeitgeist.com/tag/clippy
http://en.wikipedia.org/wiki/Office_Assistant
Ex
isti
ng
ap
plic
ati
on
s
29-11-2016
12
RSI prevention
Work Rave
Ex
isti
ng
ap
pli
ca
tio
ns
A user of Google services enables Google to get a very precise idea
about parts of my life.
Used for serving relevant ads.
The better the user model, the higher the revenue through Adsense.
29-11-2016
13
Discussion
Workrave has a preconceived idea of what is good for a PC user
Too intrusive
Clippy does some initial attempt at guessing what a user wants to do,
combined with a proactive interaction style.
Lack of personalization / adaptation / interaction style choice
Google’s goal is to create a complete user model, but respecting privacy
conflicts with their business model
Conclusions: User wants to be in control, early apps are not adaptive, since
they do not learn over time do not adapt to individual preferences
Maybe we need to learn more about humans …
The need for a more refined model of human behaviour and HCI
Goal: support for self management of workload, health etc.
Requirements / success factors at application level
Does not restrict autonomy, limited impact on privacy
Must be easy to use
What is difficult for humans
Balance short term and long term optimization
Self-perception is coloured
Memory is imperfect, selective and coloured
Easy to under or over-estimate required effort
29-11-2016
14
Reflective Practice (Donald Schön)
Reflection on experience as the basis for learning
Reflection in action: experience guides action
Reflection on action: Reflect on reaction on situation after the fact,
exploring reasons and consequences
Recommendations for reflective practice
Keeping a journal;
Seeking feedback;
View experiences objectively; and
Taking time at the end of each day, meeting, experience etc. to
reflect-on-actions.
Schön, D. (1983) The Reflective Practitioner, How Professionals Think In Action, Basic Books
Lea
rnin
g t
he
ori
es
Transtheoretical model for behavioural change
Dominant model for change of health related behaviour
Prochaska, JO.; DiClemente, CC. The transtheoretical approach. In: Norcross, JC;
Goldfried, MR. (eds.) Handbook of psychotherapy integration. 2nd ed. New York: Oxford
University Press; 2005. p. 147–171.
29-11-2016
15
Summary
Hypothesis 1: Human effectiveness can be significantly improved with
reflection and introspection that questions assumptions and frames.
“Thinking out of the box” “(Avoid) tunnel vision” “Take the blinkers off”
Hypothesis 2: It is useful to recognize different stages of behavioural
change and provide appropriate feedback
How can we support reflective practice?
Provide tools for objective logging and activity analysis
Stimulate people to take time for reflection by making it easy.
Provide non-judgemental and stimulating feedback
Involve peers / colleagues / friends
SWELL: Main hypothesis
Self-lifelogging (recording activities and inferring mental/physical state) can be used to improve well-being of knowledge workers, by:
Supporting behavioural changes
Improving self efficacy and self-knowledge
Main data analytics challenges: SENSE: Interpreting human activity and mental + physical condition on the
basis of a combination of various types of unobtrusive low level sensors
REASON: Reasoning about activities and the consequences of different action alternatives
ACT: Personalizing the motivational feedback for health and well being applications
29-11-2016
16
36
Affective computing
Privacy respecting data processing
Causal inference
Real time classification
Combination of unobtrusive sensors
37
Personalized context sensitive recommendation
Explainable AI Learning to coach
from user feedback
Persuasive technology
29-11-2016
17
Interpret
sensor data in
context
Create
customised
recommendation
Worker / Patient
sense Act (Nudge)
State & context
SWELL workflow
Study example: Affective computing to measure stress and mental effort
50
29-11-2016
18
SWELL Results (Koldijk)
Promising results with unobtrusive stress monitoring
Methodology for designing stress interventions, grounded in theory and operationalized by sensor technology and taking into account privacy concerns
Several prototype m-health apps
Koldijk, S., Bernard, J., Ruppert, T., Kohlhammer, J., Neerincx, M.A., & Kraaij, W. (2015). Visual Analytics of Work Behavior Data - Insights on Individual Differences. In: Proceedings of EuroVis 2015 Koldijk, S., Neerincx, M.A., Kraaij, W., Detecting work stress in offices by combining unobtrusive sensors , (2016) IEEE Transactions on Affective Computing
Koldijk, S., Kraaij, W. & Neerincx, M.A. (2016). Deriving Requirements for Pervasive Well-Being Technology From Work Stress and Intervention Theory: Framework and Case Study. JMIR Mhealth Uhealth 2016;4(3):e79.
Research on: Sensing & Interpretation
Perceived Stress (VAS)
Emotion (SAM)
Mental Effort (RSME)
Task Load (NASA-TLX)
Controlled variable:
induced stress
Outcomes
29-11-2016
19
How could stress or related aspects be measured with (unobtrusive) sensors?
Unobtrusive measurements
The SWELL Knowledge work dataset for stress and user modeling research [koldijk,2014]
54
Controlled experiment with 25 subjects
Three blocks: neutral, stressor 1, stressor 2
Sensor measurements and questionnaires
Koldijk, S., Sappelli, M., Verberne, S., Neerincx, M.A., & Kraaij, W. (2014). The SWELL Knowledge Work Dataset for Stress and User Modeling Research. In: Proceedings of the 16th ACM International Conference on Multimodal Interaction (ICMI 2014) Kraaij, Prof.dr.ir. W. (Radboud University & TNO); Koldijk, MSc. S. (TNO & Radboud University); Sappelli, MSc M. (TNO & Radboud University) (2014): The SWELL Knowledge Work Dataset for Stress and User Modeling Research. DANS. http://dx.doi.org/10.17026/dans-x55-69zp Sappelli, M., Verberne, S., Koldijk, S., & Kraaij, W. (2014). Collecting a dataset of information behaviour in context. In: Proceedings of the 4th Workshop on Context-awareness in Retrieval and Recommendation (CARR @ ECIR 2014) (Amsterdam, The Netherlands, 13-16 April 2014).
Feature values are averaged per
minute
29-11-2016
20
Identifying the working condition
Conclusion: Sensors do record different behaviour in stressor conditions
10 fold cross validation
Modality comparison
Conclusion: Posture is the strongest feature modality, Facial helps
29-11-2016
21
Predicting subjective mental state/effort
Sensor data seems to be most powerful for predicting ‘mental effort (RMSE)’
Model tree regression best performer (correlation 0.83)
Facial expression features are best predictors, followed by posture.
How important are individual differences?
Identifying working condition: Full population based SVM : 90% (majority baseline 62%)
Adding participant ID: no change!
Test on unseen user: (leave one out): average 59% (min 37.5, max 88.34)
=> performance for new users might be very low
Predicting mental state: Full population based regression model : 0.83
Adding participant ID: 0.94
Test on unseen user: (leave one out): 0.03
=> performance for new users might be very low
Especially the second task is sensitive to individual differences
29-11-2016
22
Towards a subtype based analysis
Idea: cluster subjects, train and evaluate classifier for subtypes
Hierarchical clustering (for determining k) and k-mean
Modality Subtype1 Subtype2 Subtype3 General
Computer Writers(16) 0.17
Copy-pasters(9) 0.34
0.15
Facial Low expression (16) 0.79
Eyes wide & mouth tight(3) 0.81
Tight eyes & loose mouth (6) 0.87
0.81
Posture Sits still & moves right arm (5) 0.76
Restless body & calm wrist 6) 0.85
Average movement (14) 0.69
0.59
Conclusions & Next Steps
SWELL KW dataset multimodal dataset for research in user modeling and affective computing
We can distinguish stressful working conditions in a controlled setting using non obtrusive sensors (posture best feature)
Mental effort can best be estimated using facial expression data
Individual differences play a significant role
A hierarchical approach using subtypes is promising
Plan: run field study
Field study at BZK
Validate stress markers
61
29-11-2016
23
Take aways
Big data offer a large potential for new business and scientific research
Ethical considerations should be taken into account when designing big data processing systems
Affective computing and behavioural analytics are potentially powerful techniques for developing digital personal assistants
77
Top Related