382C Empirical Studies in Software Engineering Lecture...

32
382C Empirical Studies in Software Engineering Lecture 13 © 2000-present, Dewayne E Perry 1 © 2000-present, Dewayne E Perry Artifacts/Confounding Variables Dewayne E Perry ENS 623 [email protected]

Transcript of 382C Empirical Studies in Software Engineering Lecture...

Page 1: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 1© 2000-present, Dewayne E Perry

Artifacts/Confounding Variables

Dewayne E PerryENS 623

[email protected]

Page 2: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 2

Limitations on KnowledgeAll experiments subject to errorUnderstand and measure itDoes not destroy our opportunities

Makes us aware of errors and limitsExtraneous variables that vary systematically

Importance of keeping other variables equalRule out alternative explanations

Two prime sourcesIrrelevant effects of proceduresArtifacts: biasing effects of investigators and participants

Page 3: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 3

Nature of ProblemArtifact – finding resulting from factors other than the one intended

Usually quite extraneous to the intent of the experimenterFactors that can jeopardize the validity of the conclusions

Interested in subject-experimenter artifactsMust have dependable knowledge about the E-S equationAstronomers need to know the effects of their telescopesIn behavioral experiments, experimenter is the instrument of observation and manipulation

Subject side of the equationHuman complexityNo two research subjects behave identicallyThe same careful experiment will have different results in different places/timesSubjects know they are research participantsResearch subject role well understood

Page 4: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 4

Nature of ProblemExperimenter side of the equation

Systematic errors usually unintentional2 classes

InteractionalBiases that effect the response of the subject

Non-interactionalIn the mind, eye or hand of the experimenter

ControlComparison condition to isolate some effectProcedure to serve as a check on validity

Page 5: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 5

History of ProblemClever Hans

Horse known for remarkable intellectual featsTap out with help ostensibly of code table in front of him

answers to mathematical problemsDate of any day mentioned

Psychologist Oskar Pfungst noticed that he responded to unintended cues from his questioners

Eg, body positionIf animals can do this why not humans

Page 6: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 6

History of ProblemHawthorne Works study – began in 1924

How workers productivity were affected by workplace conditions such as light, temperature, rest periodsBoth treatment and control groups increased their performanceSuggested reasons:

flattered to participateKeenly aware and responsive to task clues

Hawthorne effect now synonymous with placebo effect, the power of suggestion

Page 7: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 7

History of ProblemRosenzweig 1933 – landmark paper

Argued that the experimental situation is a psychological problem in its own rightDeveloped methodological analysis and taxonomy of certain types of interactionsContended that subjects try to guess the purpose of the experiment and give the results expected

Called the good subject effectFurther, the experimenter might unintentionally influence the results

Page 8: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 8

Resistance to the ProblemWhy did it take so long for systematic research to begin in earnest?3 suggested reasons

Phenomenon of artifacts that stem from playing a subject role presupposes the active influence of conscious cognitionConcerns about pervasive biases were possibly viewed as impeding emergence and growing influence of behavioral researchLogical positivist view placed great faith on impartiality of research

In late 50s, positivists and logical empiricists tenants began to loose their hold and cognitive science rose

Page 9: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 9

Demand CharacteristicsOrne’s work on hypnotism and subject expectations

Coined the term demand characteristics of the experimental situation

Could expectations not also apply to other research?Treatment group: novel characteristic – catalepsy of the dominant hand; Control group: no such mentionAlmost all the treatment group exhibited the catalepsy; none in the control groupTypical subject:

Attentive to demand characteristicsAttempted to behave altruistically in a way that confirms the experimenters hypothesis

Page 10: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 10

MotivationAltruism, evaluation apprehension, obedience as motivators

As early as high school, associate subject role with such characteristics as being cooperative, alert and observantDo not always enact altruistic role – may be other motivations

Aiken and Rosnow 7311 statements representing three motivations were compared to each other as pleasant or unpleasantKey situation: being a subject in a psychology experimentRR, Figure 6-1: arrows show mean psychological distanceWhat do we learn from the map

Being subject closely associated with good subject roleObedience and evaluation also entered into subjects thinkingParticipation is mildly pleasant work-oriented activity

Looking good (evaluation apprehension) is more likely to be dominant than doing good

Page 11: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 11

Task Oriented CuesDetection:

Use quasi-control subjects as a possible way to get the subjects to figure out what is going on just by thinking about itServe as co-investigators rather than subjects of the study

Orne suggests 3 techniquesExperimental subjects function as their own controls

Post-experimental interviewPilot study on their perceptions/beliefs

Pre-inquiry:Quasi-control subjects to imagine they are the real subjectsNot subjects, but given full treatment informationQuasi-subjects predict how they might behaveSimilarity between data from quasi and real implies results could be affected by subject guesses

Blind controls: unaware of their statusCompare blind controls to quasi controlsBlind groups sometimes used as a sacrifice group

Page 12: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 12

Task Oriented CuesAlternative: observe dependent variable in different contexts

Eg, both inside and outside the lab settingEg, observed by someone other than the experimenters

Orne considered these to be supplementary techniquesDo not automatically enable us to avoid problemsNot always aware of effectsChallenge: their subtlety and teasing them out

Interesting model proposed, a preliminary statement, not a theory

Assumption: subjects are sensitive to coercive demands of whatever propriety norms may be operating in the experimentFocuses on a few intervening variables instead of categorizing artifact producing variables

Page 13: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 13

Theoretical ViewAssumption: artifact producing variables generalize to a few mediatory factors:

-> compliance, non-compliance, counter-compliance-> receptivity, motivation, capability

Either receptive or notIf not receptive, then non-compliant

Either motivated, not motivated or uncooperativeIf not motivated, then non-compliantIf not cooperative and capable, then counter compliantIf not cooperative and incapable, then non-compliant

Either capable or incapableIf incapable then non-compliantOtherwise compliant to demand characteristics

The only path to worry about

Page 14: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 14

Prediction and ControlTwo objectives in the model

Visualize systematically how demand characteristics operateBlueprint for strategies for reducing or eliminating subject artifacts

Allows us to generate theoretical predictions about how artifact producing events might operate in a given situation

Eg. Clarity of demand and subjects behavior as a resultConsider only receptivity and motivation; hardly ever incapableMotivation and receptivity cancel out when demand very high or very low

Page 15: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 15

Strategies Receptivity manipulations to minimize demand clarity

Measure dependent variable in a setting not obviously connected to the treatment or employ unobtrusive measurements

Ideally, no demand characteristics receivedApproximated in field studies with unobtrusive measuresUnaware -> receptivity is nil

Measure the dependent variable removed in time from the treatment

Ideal usually not met, reception of demand unavoidableSome demand transmitted by means of relationship between treatment and testBreak the relationship: separate in time and space

Employ Solomon design or else avoid pre-testing, especially in attitude change experiments, and instead employ after-only design

Pretest sensitization is a problemMeasure effect or rule out pre-test

Page 16: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 16

StrategiesStandardize and restrict the experimenters communication with subjects

Experimenters are the main channel of demand characteristicsThe more standardized and restricted the betterEg, computerized instructions

Use blind procedures in testing and experimental manipulations

The less known the less transmittedReceptivity manipulation to generate alternative demands

Elicit false hypotheses about the purpose of the research, ie be deceptive

Contrived demands

Page 17: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 17

StrategiesMotivation manipulations to encourage honest responding

Give feedback of compliant behavior in a set of pre-experimental tasks to being the subject to a state of non-acquiescence

More difficult and less confident in manipulation outcomeAim for cooperation and favorable evaluation for true experiment

Make experimental setting and procedures low-keyed and non-threatening (eg, anonymous of confidential)

Non-threatening to avoid evaluation apprehensionProtection of privacy is important

Encourage honest responding thru self-monitoring bogus pipeline: subject is told device detects lyingSubject will give truthful answersNot without risk

No method really infallible, but do need to think deeply about the problem

Page 18: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 18

Non-interactional EffectsSystematic errors on the experimenter side

Observer effectsInterpreter effectsIntentional effects

Observer effectsNot so easy to be sure that one has made accurate observationsObserver effects well know in astronomyAccounted for in interpreting the data

Interpreter effectsExperimenters rarely debate accuracy of observations, will debate interpretationDifficult to state rules of accurate interpretationWrongness of interpretation often due to theory monogamy

Though theory monogamy often advantageousIntentional effects

Implies dishonestyMay cook the data too much – ie, too good to be trueNeed strong sense of ethics and honesty

Page 19: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 19

Interactional EffectsBiosocial effects

Gender, age, race, of experimenterSubjects may respond differently to those aspects of the experimenterCan get different results merely by varying these factorsMales and females may unconsciously conduct different experiments

Males might be more friendly towards female subjectsBefore declaring gender differences in studies must make sure they were treated the same

Psycho-social effectsPersonality, temperament, etcDifferences in hostility, authoritarianism, status and warmth will get different responsesWarmer examiners tend to get better responses than cooler challenging

Page 20: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 20

Interactional EffectsSituational effects

Context, situationMore experience experimenters tend to get different results than less experiencedAcquaintance may yield different results as wellWhat happens during experiment can cascade throughout the rest of the experiment

Modeling effectsOften experimenter will trial experimentSometimes the experimenter’s performance becomes a factor in the subject’s performanceWhen situation is ambiguous, subjects may agree with experimenter too oftenExperimenter’s behavior may have been influencing results

Sell-fulfilling prophecyResearchers expectations – expectancy effectEg, teacher thinks X is bright, treats differently from Y

Page 21: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 21

Experimenter Expectancy EffectsConsider a standard type of experiment

Differing only in hypotheses, expectationsEg, study of bright rats, bright did better Meta-analysis of 345 studies

Mean effect size of expectancy bias of .33Vary according to category of studyDo occur to a considerable degree in all

How are expectancy effects communicatedPsychological climatePhysical distance from interactantsFrequency and duration of interactionsEye contact and smilingVerbal rewards and punishments

Page 22: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 22

Expectancy ControlsIncrease the number of experimenters

Decreases learning of influence techniquesHelps maintain blindnessRandomizes expectanciesIncreases generality results

Observing the behavior of experimentersSometimes reduces expectancy effectsPermits correction of unprogrammed behaviorFacilitates greater standardization

Analyze experiments for their order effectsPermits inference about changes in experimenter behaviorCompare earlier with later

End of experiment changes (whew!, etc)Learning effect on experimenter

Page 23: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 23

Expectancy ControlsMaintaining blind contact

Minimize expectancy effectsLack of knowledge of treatment being given make expectancy unlikely

Minimizing experimenter-subject contactMinimizes expectancy effectsImportant: does it reduce the realism of the manipulations

This affects generalizationEmploying expectancy control groups

Permits assessment of expectancy effectsExperimenter expectancy becomes a second variableGet the magnitude of expectancy effect

Page 24: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 24

Participant VariablesDemographic and personal characteristicsCritical issues: groups need to be comparableMethods of control:

Random assignmentEasiest and surest way of scrambling all possible variables across all groupsPromotes but does not guarantee equivalence

Particularly on small samplesHomogeneous sample

Restrict variance by narrowing sampleHave to control potential confoundsPrice: generalizability can be challenged

Page 25: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 25

Participant VariablesMatched participants

Virtual twins in each groupDesired size and diversityRule out group differencesDifficult to find enough people who match on more than a few variables

Often narrow match – but have to be carefulReferred to as matched group design

Equated groupsMeans, medians and percentages are important participant variables of the groupsGroups should not be significantly different - should be significantly alikeAssess by nonparametric tests: chi-square, Z testPossible strategy: drop, add or exchange members

Could change mean of other variablesDropping after measurement should raise skepticism

Page 26: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 26

Participant VariablesStatistical control

Balance secondary variablesTreat as covariant in covariance analysis

Adjust scores for secondary effectsCreation of blocking variables

Study effect and see whether if interacts with treatment variableMust increase the number of cells

Own controlSampling error is the largest error built into a design that has different people in each groupEspecially useful in SWE for accounting for differences in abilities/productivitySome studies do not lend themselves to this kind of control

Long term studies, egPossible problems:

First treatment effects response on secondLearning effects from first test

Page 27: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 27

Participant VariablesExtra-experimental changes in participantsCritical issues:

Especially in cases where considerable time elapsesMaturity and history

Methods of controlCannot be prevented over the long courseBut if truly random, odds are greater against systematic problems

Page 28: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 28

Participant VariablesMotivation and role perceptionCritical issues

Are some more motivated than others?Are egos more involved in some than others?Is it important to be a part of a studyUnequal benefits may result in unequal performancePerception of the role might differ systematicallySecond guessing, scoping out, expectations

Methods of controlJudge whether same benefits and rewardsConstant motivation over time and between groupsUnobtrusive and non-reactive measures

Page 29: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 29

Participant VariablesCommunication among participantsCritical issues:

Communicating experiences with those waiting for treatmentPossible where participants drawn from a co-located populationNot a problem in some cases: eg, auditory acuityWhere there are right/wrong answers, judgments

Methods of controlPhysical separation or simultaneous treatmentWith adults, explain problem and ask for cooperationPretreatment screening for possible contaminationParticipants from different placesWork quickly and finish before communication can take placeMonitor for communication

Page 30: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 30

Participant VariablesPlacebo effectsCritical issues

Can be quite powerful effectsImportant where there are change expectanciesEspecially where benefit expected

Methods of controlPlacebo to random half of sampleNot always appropriate – eg, psychotherapyWhat about SWE?

Page 31: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 31

Experimental VariablesCritical issues

Interactional effectsBiosocial effects:

Demographic: men reacting to womenPsychosocial effects

Personal characteristics: don’t like pushy peopleSituational effectsModeling effects

Self-fulfilling propheciesDemand characteristics

Noninteractional effects:Observer effectsInterpreter effects

Personal equationEg, astronomer’s observations differed

Selective – effects that are different in one groupSecondary variance - affect both groupsExperimenter bias

Page 32: 382C Empirical Studies in Software Engineering Lecture …users.ece.utexas.edu/~perry/education/382c/L13.pdf · 382C Empirical Studies in Software Engineering Lecture 13 ... 382C

382C Empirical Studies in Software Engineering Lecture 13

© 2000-present, Dewayne E Perry 32

Experimental VariablesMethods of control

Institutionalized critical review processExperience and self-disciplineControl bio-social effects

Anticipate themRule them out, minimize by designAnalyze themReport them

Psychosocial effectsSame ways as biosocialTrial experiment and monitor experimenter