Hcic muller guha davis geyer shami 2015 06-29
-
Upload
michael-muller -
Category
Data & Analytics
-
view
717 -
download
1
Transcript of Hcic muller guha davis geyer shami 2015 06-29
Developing Data-Driven Theories via Grounded Theory Method and via Machine Learning
1
Michael Muller, Shion Guha*,
Matthew Davis, Werner Geyer,
Sadat Shami
IBM Research and IBM
* Returning to Cornell University at the end of the summer
Working with Theory
• Approaches to the use of theory in HCI and CSCW
• This paper is not about a theory
Approach Characterization Validation and Next steps
Hypothesis testing Top-down evaluation Generalization
Induction Bottom-up rich description Comparison
Abduction Develop new theory Cycles of description, analysis,
modification
• This paper is not about a theory
– Grounded theory is not a theory
• It is a collection of methods for developing a theory
– Machine learning is not a theory
• It is a collection of methods for developing a theory or a description or a
prediction
• What is surprising: the Conundrum
– Grounded theory methods and machine-learning methods
seem to have much more in common than expected
2
Outline
• Introduction
� Conundrum: Convergence of Grounded Theory and Machine Learning?
– Sketch of Grounded Theory (GT)
– Sketch of Machine Learning (ML)
– What this talk is not about
• Conundrum
– Examples– Examples
• Two similarities and One Dissimilarity
– Modeling “up” from the data
– Modeling “down” from a priori premises
– Rigor
• Restating the Conundrum
– A call to question
– A call to action
3
• Combination of an open mind with rigor
• One way to approach a new domain
– … or a domain without a dominant organizing theory
• Intermeshing of data collection, theorizing, evaluating,
reflecting, iterating
– Collect some data
– Make a preliminary theory before data collection is complete
Sketch of Grounded Theory
– Make a preliminary theory before data collection is complete
– Critique the developing theory, test it, change it, improve it
– Using methods that have proven heuristically useful over time
• Guided, in part, by abductive reasoning
Theory
about data
Theory
about data
4
Theory
about data
constant
comparisonData about
theory
Data about
theory
Data about
theory
“Grounded theory methods consist of simultaneous data
collection and analysis, with each informing and focusing the
other throughout the research process. As grounded theorists,
we begin our analysis early to help us focus further data
collection. In turn, we use these focused data to refine our
emerging analyses. Grounded theory entails developing
increasingly abstract ideas about research participants’increasingly abstract ideas about research participants’meanings, actions, and worlds and seeking specific data to fill
out, refine, and check the emerging conceptual categories...”(Charmaz, 2006)
5
“Machine learning is the construction and study of algorithms
that can learn from and make predictions on data … such
algorithms operate by building a model from example inputs in
order to make data-driven predictions or decisions rather than
following strictly static program instructions.”following strictly static program instructions.”
- (Bishop, 2006)
6
Sketch of Machine Learning
• Unsupervised learning
– Often exploratory and less “rigorous”
– Often no pre-determined hypothesis but want to play with data
– Often no ideas about relationships between variables
– Examples: clustering
• Supervised learning
– We have some ideas about dependent and independent variables– We have some ideas about dependent and independent variables
– We often have some ideas about possible hypotheses
– We want to predict or ascertain causal relationships between variables
– Examples: classification and regression
7
The Conundrum
8
Surprising Convergences in Ways of Thinking and Knowing
Bottom-Up Inquiry
• Grounded Theory Method
– Initially unorganized data
– Constant comparison of
theory and data
– Descriptive theory is built
Top-Down Inquiry
• Grounded Theory Method
– Apply coding families to
make theoretical sense of
data
– Constant comparison of – Descriptive theory is built
from data up into theory
• Machine Learning
– Initially unorganized data
– Iterative development of
classifications or relations
– Descriptive classifications are
built from data up into theory
– Constant comparison of
theory and data
• Machine Learning
– Apply theorized categories
and test for fit of data
– Iterative refinement of
classifications or relations
9
Example A: Machine Learning about Persons
(Michelle Zhou)
10
http://www.slideshare.net/MichelleZhou1/system-u-computational-discovery-of-personality-traits-from-social-media-for-individualized-experience
Example B: Grounded Theory about Persons
11
Clarke, Adele & Star, Susan Leigh (2008). The social worlds framework: A theory/methods package. In Edward
Hackett, Olga Amsterdamska, Michael Lynch & JudyWajcman (Eds.), The handbook of science and technology
studies (pp.113-139). Cambridge, Massachusetts: The MIT Press.
Mathar, Tom (2008). Making a mess with situational analysis? Forum: Qualitatiive Social Research
Sozialforschung 9(2), Art. 4.
Example B: Grounded Theory: Codes to Classify People
“8[W]ith the inclusion of theoretical concepts of the primary study
such as typologies it is even possible to use an inductive procedure.
For example, provided that category schemas have the same
heuristic function as a huge "filing box" with broad, and not "a priori"
theory-loaded categories, then their use for secondary analysis does
not have to conflict with open coding in the process of the
development of in-vivo categories.” (Medjedović and Witzel, 2006)
12
More Detailed Examination of Methods
• We’ve seen a few examples. Is there more to this convergence
than those examples?Grounded Theory Machine Learning
– Deriving
categories
from data
– Applying
a priori
Discovery of Codes and
CategoriesLabeling and Exploring
Applying Codes to Data Training and Testinga priori
categories
to data
– Rigor
13
Applying Codes to Data Training and Testing
Abductive Logic Validating and Predicting
Exploring Data with Grounded Theory:Discovery of Codes and Categories
14
How to Use the Affect of Surprise in Data and Theory
15
Muller, M. (2014). Curiosity, creativity, and surprise as analytic tools: Grounded theory method. In J. Olson and W.A.
Kellogg (Eds.), Ways of knowing in HCI. Springer.
An Imagined Inquiry into Organizational Work Practices
• A new(ish) domain – how to start?
– Choose a “site” == a person or persons
in a role? a job title? Not sure yet
– Open codes – individual, group, team
– Open codes – time-pressured,
quality-focused
Begin to integrate our tentative knowledge
– Axial code – Collaboration-preference
�
�
1
Axial code – Collaboration-preference
– Axial code – Value-priority
• But we’ve also heard about
– Communities of practice
– Centers of excellence
– Networks (?)
– Councils (?)
If these are collections of employees, how do they map onto groups, teams?
– We’re still being surprised. Let’s find out more!
– Talk with people in these new-to-me collaborative configurations
16
An Imagined Inquiry into Organizational Work Practices
• A new(ish) domain – how to start?
– Choose a “site” == a person or persons
in a role? a job title? Not sure yet
– Open codes – individual, group, team
– Open codes – time-pressured,
quality-focused
• Begin to integrate our tentative knowledge
– Axial code – Collaboration-preference
1
– Axial code – Collaboration-preference
– Axial code – Value-priority
• But we’ve also heard about
– Communities of practice
– Centers of excellence
– Networks (?)
– Councils (?)
If these are collections of employees, how do they map onto groups, teams?
– We’re still being surprised. Let’s find out more!
– Talk with people in these new-to-me collaborative configurations
17
An Imagined Inquiry into Organizational Work Practices
• A new(ish) domain – how to start?
– Choose a “site” == a person or persons
in a role? a job title? Not sure yet
– Open codes – individual, group, team
– Open codes – time-pressured,
quality-focused
• Begin to integrate our tentative knowledge
– Axial code – Collaboration-preference
Collaboration-preference
•Individual
•Group
•Team
•…
Value-priority
•Time-pressured
•Quality-focused
•…
1
– Axial code – Collaboration-preference
– Axial code – Value-priority
• But we’ve also heard about
– Communities of practice
– Centers of excellence
– Networks (?)
– Councils (?)
If these are collections of employees, how do they map onto groups, teams?
– We’re still being surprised. Let’s find out more!
– Talk with people in these new-to-me collaborative configurations
18
An Imagined Inquiry into Organizational Work Practices
• A new(ish) domain – how to start?
– Choose a “site” == a person or persons
in a role? a job title? Not sure yet
– Open codes – individual, group, team
– Open codes – time-pressured,
quality-focused, client-driven
• Begin to integrate our tentative knowledge
– Axial code – Collaboration-preference
Collaboration-preference
•Individual
•Group
•Team
•…
Value-priority
•Time-pressured
•Quality-focused
•…�
�
�
�
1
– Axial code – Collaboration-preference
– Axial code – Value-priority
• But we’ve also heard about
– Communities of practice
– Centers of excellence
– Networks (?)
– Councils (?)
If these are collections of employees, how do they map onto groups, teams?
– We’re still being surprised. Let’s find out more!
– Talk with people in these new-to-me collaborative configurations
19
�
�
An Imagined Inquiry into Organizational Work Practices
• A new(ish) domain – how to start?
– Choose a “site” == a person or persons
in a role? a job title? Not sure yet
– Open codes – individual, group, team
– Open codes – time-pressured,
quality-focused, client-driven
• Begin to integrate our tentative knowledge
– Axial code – Collaboration-preference
Collaboration-preference
•Individual
•Group
•Team
•…
Value-priority
•Time-pressured
•Quality-focused
•…
1
– Axial code – Collaboration-preference
– Axial code – Value-priority
• But we’ve also heard about
– Communities of practice
– Centers of excellence
– Networks (?)
– Councils (?)
If these are collections of employees, how do they map onto groups, teams?
– We’re still being surprised. Let’s find out more!
– Talk with people in these new-to-me collaborative configurations
20
An Imagined Inquiry into Organizational Work Practices
• A new(ish) domain – how to start?
– Choose a “site” == a person or persons
in a role? a job title? Not sure yet
– Open codes – individual, group, team
– Open codes – time-pressured,
quality-focused, client-driven
• Begin to integrate our tentative knowledge
– Axial code – Collaboration-preference
Collaboration-preference
•Individual
•Group
•Team
•…
Value-priority
•Time-pressured
•Quality-focused
•…
1
– Axial code – Collaboration-preference
– Axial code – Value-priority
• But we’ve also heard about
– Communities of practice
– Centers of excellence
– Networks (?)
– Councils (?)
If these are collections of employees, how do they map onto groups, teams?
– We’re still being surprised. Let’s find out more!
– Talk with people in these new-to-me collaborative configurations
21
An Imagined Inquiry into Organizational Work Practices
• A new(ish) domain – how to start?
– Choose a “site” == a person or persons
in a role? a job title? Not sure yet
– Open codes – individual, group, team
– Open codes – time-pressured,
quality-focused, client-driven
• Begin to integrate our tentative knowledge
– Axial code – Collaboration-preference
Collaboration-preference
•Individual
•Group
•Team
•…
Required structures?
Value-priority
•Time-pressured
•Quality-focused
•…
�
1
– Axial code – Collaboration-preference
– Axial code – Value-priority
• But we’ve also heard about
– Communities of practice
– Centers of excellence
– Networks (?)
– Councils (?)
If these are collections of employees, how do they map onto groups, teams?
– We’re still being surprised. Let’s find out more!
– Talk with people in these new-to-me collaborative configurations
22
•…
An Imagined Inquiry into Organizational Work Practices
• A new(ish) domain – how to start?
– Choose a “site” == a person or persons
in a role? a job title? Not sure yet
– Open codes – individual, group, team
– Open codes – time-pressured,
quality-focused, client-driven
• Begin to integrate our tentative knowledge
– Axial code – Collaboration-preference
Collaboration-preference
•Individual
•Group
•Team
•(other collaborations)?
(Required structures)?
Value-priority
•Time-pressured
•Quality-focused
•…
1
– Axial code – Collaboration-preference
– Axial code – Value-priority
• But we’ve also heard about
– Communities of practice
– Centers of excellence
– Networks (?)
– Councils (?)
If these are collections of employees, how do they map onto groups, teams?
– We’re still being surprised. Let’s find out more!
– Talk with people in these new-to-us collaborative configurations
23
•…
Problem for “Preference”: Individuals in multiple roles
• More interviewing…
– Each employee can be in multiple
collaborations
– … and can have a different role in each
– It’s not a matter of “collaboration-preference”
Collaboration-preference
•Individual
•Group
•Team
•(other collaborations)?
(Required structures)?
Value-priority
•Time-pressured
•Quality-focused
•…
Collaboration style?
2
24
•…
Problem for “Preference”: Individuals in multiple roles
• More interviewing…
– Each employee can be in multiple
collaborations
– … and can have a different role in each
– It’s not a matter of “collaboration-preference”
• Are there different types of
collaborations, each of which has its
own distinct relationships?
Collaboration-preference
•Individual
•Group
•Team
•(other collaborations)?
(Required structures)?
Value-priority
•Time-pressured
•Quality-focused
•…
�
Collaboration style?
2
own distinct relationships?
– Re-read our interview transcripts
– Re-visit our memos
– Collect more interview data (or other types of data?)
25
•…
Problem for “Preference”: Individuals in multiple roles
• More interviewing…
– Each employee can be in multiple
collaborations
– … and can have a different role in each
– It’s not a matter of “collaboration-preference”
• Are there different types of
collaborations, each of which has its
own distinct relationship?
Collaboration-preference
•Individual
•Group
•Team
•Community of practice
•Center of excellence
•Council
•Network
•…
(Required structures)?
Colla
bora
tion r
ole
Collaboration configurations?
2
own distinct relationship?
– Re-read our interview transcripts
– Re-visit our memos
– Collect more interview data (or other types of data?)
• Teams and groups appear to be in different genres
– Return to our earlier observation
that there are also communities, centers,
councils, networks…
– And each genre seems to entail a different
set of relationships
26
(Required structures)?
Value-priority
•Time-pressured
•Quality-focused
•…
�
Discovering Codes Summary
• We started with an unexamined, quasi-essentialist notion that
individuals had preferred ways of collaborating
• We then discovered that at least some people had multiple
collaborative relations, with different structures
• We eventually understood that the manner of collaborating
was more a matter of the collaboration structures, which
required (?) or offered (?) different collaboration rolesrequired (?) or offered (?) different collaboration roles
• Additional questions, if we decide that we want our grounded
theory analysis to go in these directions
– Are structures and their roles required? offered?
– Do the attributes of individual employees matter? Do people have
preferred collaboration roles? Do their preferences influence what
types of collaboration structures they join?
– What other types of collaboration structures are there?
– …
27
Exploring Data with Machine “Learning”:Discovery of Clusters and Labels
28
A Classical Example of Learning from Data: Fisher’s Irises
29
A Classical Example of Learning from Data: Fisher’s Irises
30
Learning from Learning: Fisher’s Irises
31
Theorizing from Codes:Grounded Theory tries not to impose theory or
sets of categories prematurely…right?right?
32
“The Abstraction of the New”
Starr (2007): “Codes allow us to know about the field we study,
and yet carry the abstraction of the new… When this process is
repeated, and constantly compared across spaces and across
data… this is known as theoretical sampling… Theoretical
sampling stretches the codes, forcing other sorts of knowledge
of the object… taking a code and moving it through the data…
fractur[ing] both code and data.”fractur[ing] both code and data.”
33
“The Abstraction of the New”
Hernandez (2009): “ ‘Substantive codes conceptualize the
empirical substance of the area of research. Theoretical codes
conceptualize how the substantive codes may relate to each
other as hypotheses to be integrated into the theory’ (Glaser,
1978). Substantive codes break down (fracture the data) while
theoretical codes ‘weave the fractured story back together
again’” (Glaser, 1978, p. 72)...again’” (Glaser, 1978, p. 72)...
34
A Priori Coding Structures
Paradigm
(Strauss & Corbin, 1990)
• Causal conditions
• Phenomena
• Context
• Intervening conditions
6 Cs
(Glaser, 1978)
• Causes
• Contingencies
• Context
• Conditions• Intervening conditions
• Action / interaction strategies
• Consequences
35
• Conditions
• Covariance
• Consequences
Glaser’s Coding Families
6 Cs
(Glaser, 1978)
• Causes
• Context
• Contingencies
• Consequences
36
• Consequences
• Covariance
• Conditions
Glaser’s Coding Families
Basics
(Glaser, 1989)
• Social process
• Social structural process
• Structural conditions
• Social psychological process
6 Cs
(Glaser, 1978)
• Causes
• Context
• Contingencies
• Consequences• Social psychological process
• Psychological process
37
• Consequences
• Covariance
• Conditions
Glaser’s Coding Families
Basics (Glaser, 1998)
• Social process
• Social structural process
• Structural conditions
• Social psychological process
• Psychological process
6 Cs (Glaser, 1978)
• Causes
• Context
• Contingencies
• Consequences
• Covariance
• Conditions
Degree (Glaser, 1978)
• Ranks
• Grades
• Continuum
• Levels
• Limit
• Range
• Intensity
• Extent
Process (Glaser, 1978)
• Stages
• Staging
• Phases
• Phasing
• Progressions
• Passages
• Transitions
• TrajectoriesBoundary (Glaser, 1998)
• Limits, Outer limits, Confidence limits,
38
• Extent
• Amount
• Trajectories
• Gradations
• Steps
• Shaping
• Ranks
• Ordering
• Chains
• Sequencing
• Temporaling
• Cycling
• Limits, Outer limits, Confidence limits,
Front line, Deviance
• Boundary maintaining mechanisms
• Tolerance zones, Transitional zonesMeans-Goals
(Glaser, 1978)
• End
• Purpose
• Goal
• Product
• Anticipated
consequences
(Unnamed coding family (Glaser 2005)
• Asymptote Theoretical Codes (family)
(getting as close as possible)
• Fractals Theoretical Codes (family)
• Autopoesis Theoretical Codes (family)
(e.g., structural coupling)
Glaser’s Approach to Coding and Theory
“Over the past three decades, Glaser has identified many
theoretical codes and theoretical coding families that can
emerge in grounded theory: 18 in Theoretical Sensitivity (Glaser,
1978), 9 in Doing Grounded Theory (Glaser, 1998), and 23 in
Theoretical Coding (Glaser, 2005).
…. When more than one theoretical code can fit the data, then
the researcher must make a choice but this decision will be the researcher must make a choice but this decision will be
‘grounded in one of the many useful fits’ (Glaser, 1978). ”
(Hernandez, 2009)
39
Glaser’s Approach to Coding and Theory
“Glaser… provides… 40 theoretical coding families (Glaser 1978;
1998; 2005), and he admits that the list is far from exhaustive…
[A] selection of recommended theoretical texts for the
identification of the widest possible range of theoretical codes
would be helpful for users of Glaser’s GT.” (Christiansen, 2008)
40
Coding Structures Summary
• The foundational text (Discovery, Glaser and Strauss, 1967)
contains the seeds of two distinct a priori ways of structuring
an inquiry:
– General theory of action (The Paradigm) (Strauss and Corbin, 1990)
– Coding families (Glaser, 1978, 1998, 2005)
• Not all of the coding families or phases of action will apply in
every case. Analysis finds which ones provide good every case. Analysis finds which ones provide good
descriptive fit.
• For our purposes, coding families appear to be similar to
potential predictor dimensions or dummy variables in a
supervised machine learning paradigm, which must also be
tested for fit.
41
Exploring Data with Machine Learning:Predictions and Classifications
42
Philosophy of Machine Learning
• Unsupervised learning – There is a set of inputs that need to
be divided into groups in some meaningful way. We don’t
know anything about these groups a-priori but want some
sense of grouping based on some other attributes.
• Supervised learning – We have a set of inputs and know their
level of measurement (nominal, ordinal, interval or ratio). We
want to align some other unseen inputs into a model that will want to align some other unseen inputs into a model that will
produce an output based on the level of measurement
(classification for nominal or ordinal variables and regression
for interval or ratio variables). This is often considered
prediction.
• Both approaches help us build theoretical knowledge from a
set of data.
43
Unsupervised Learning (clustering)
44
Supervised Learning (Classification)
45
Supervised Learning (Regression)
46
A Classical Example of Prediction: Back to Fisher’s Irises!
47
A Classical Example of Prediction: Back to Fisher’s Irises!
48
Rigor
49
What is Rigor in Machine Learning?
50
What is Rigor in Machine Learning?
51
52
What is Rigor in Grounded Theory Method?
• Constant comparison of theory and data, of data and data
• Abductive logic
– How could my nascent theory be wrong? (consider multiple, competing
informal hypotheses)
– What is the strongest test that could disconfirm what I think is going on?
– Go back to the data I already have
– Choose the next “site” to test for disconfirmation– Choose the next “site” to test for disconfirmation
• What is a “site”?
– Person with theoretically-relevant attributes
– Team in the appropriate department or geography
or discipline
– Community that differs from previously-studied
communities in a theoretically-important way
– Organization or enterprise with significant
contrasts to those that I have already studied
53
Constant Comparison � Constant Questioning
“Consistent with the logic of grounded theory, theoretical
sampling is emergent. Your developing ideas shape what you do
and the questions you pose while theoretical sampling.”
(Charmaz, 2006)
54
Conclusion
55
Modeling up from the Data
• Often considered “data-driven” or inductive modeling
• We have a giant set of data – we scour said dataset
with GT or ML and we produce results
• Often these set of results are considered and iterated together
to develop novel theory
• The process is similar. Iteration and Re-iteration.
• E.g.,
– ML: Topic Modeling
– GT: Deriving descriptive codes, leading to theoretical codes, from data
56
Modeling Down from a priori Premises
• We start with well defined hypothesis.
• We collect data
• We apply a GT coding family or ML predictor (e.g., a
classification) on this data
• We accept or reject our (description or prediction) to make an
inference
• This inference is the backbone of developing novel theory
• Again, the process is similar. Code and confirm.
• E.g.,
– ML: Regression/classification with hypothesis; test for fit
– GT: Apply coding families; test for fit
57
Learning from the Conundrum?
• Despite differences in
– Basic premises
– Methods of inquiry and inference
– Figures of merit
– Criteria for rigor
– Claims of distinctiveness
– ...– ...
• We see many overlaps between ML and GT
– Are we describing basic human ways of knowing and of inferring?
• There are a number of proposals for methodological dialogues
between “big data” and “small data”, or between
“computation” and “inference”
– Does this presentation suggest, not a dialogue, but a fusion?
58