User Evaluation
Transcript of User Evaluation
UserEvaluation
AlarkJoshi
OpticalIllusions
Same Length
Same Color perceived differently
Different Color perceived as same
MotionInducedBlindness• Fixateonthecenter– Theyellowspotsdisappearonceinawhile(sometimes1andsometimesall3)
http://www.michaelbach.de/ot/mot_mib/index.html
Inattentional Blindness
http://www.youtube.com/watch?v=Ahg6qcgoay4
PreattentiveProcessing
Closure
SizeOrientationShape and Color
ColorShape
Credits: http://www.csc.ncsu.edu/faculty/healey/PP/
NeedforEvaluation• Visualsystemisextremelycomplex• Misrepresentationofdatacanhavecatastrophicresultsinsomecases– Challengerdisastercouldhavebeenaverted‐ Tufte
• Needtoensureconsistentinterpretationofdatausingavisualizationtechniqueinvariousscenarios
• “Userstudiesofferascientificallysoundmethodtomeasureavisualizationtechnique’sperformance”– Kosara etal.
NeedforEvaluation
• Novelvisualizationtechniquesprovideauniquerepresentationofdata
• Howdoyoumeasuretheeffectivenessofthetechnique?
• Howcanwebesurethatthenewtechniqueisnotmisrepresenting/misleadingtheuser?
• Howcanweconfidentlysaythatyourvisualizationtechniqueis‘better’thanexistingtechniques?
UserEvaluations
• Conductauserstudytoevaluatewhichvisualizationtechniqueis‘better’
• Auserstudyinvolvesusersperformingtasksusingavarietyofvisualizationtechniques
• Theideaistoevaluatethevisualizationtechniquesbasedontheperformanceoftheusers– Forexample,YostandNorth’06(ThePerceptualScalabilityofVisualization)
• Howdoyoumeasuretheperformance ofauser?
UserStudy
• ComparePieChartsandBarGraphs• Learnaboutthecomponentsofauserstudy• Fillouttheworksheetaswegoalongwithwhatyouthinkeachcomponentis
• Weshallthenconducttheuserstudywithallofyouas‘willing’participantsattheend
• Analyzetheresultsafterclass
DependentVariables
• Howdoyoumeasuretheperformance ofauser?• Performancemetrics
– UserAccuracyatperformingthetasks– Timerequiredtoperformthetasks– Emotionalresponse
• Confidencethattheusers’haveintheiranswers(whichindicatestheeffectivenessofatechnique)
• Stresscausedbyaparticulartechnique– Qualitativefeedbackintheformofaquestionnairethatletstheuserprovideadditionalfeedback
ConductingaUserStudy
• Whatareyoutesting?– Efficacyofanovelvisualizationtechnique– Evaluatingpre‐existingtechniquestofindthebestoneforacertaintask
• IdentifyyourNullHypothesis– Forexample,theNullHypothesiswouldbethatthatthenovelvisualizationtechniqueisnotmoreeffectivethancurrentlyusedtechniques
IndependentVariables
• Whatarethevariablesinthestudythatareindependent?
• Variablesthatwemanipulateinthestudy• Thevisualizationtechniquesthatyoushowtheuserareyourindependentvariables– E.g.YouhavethreeindependentvariablesifyouarecomparingTreemaps vs BarChartsvs PieCharts
Datasets
• Thedatasetschosenforthestudyshouldbestandardandanonymous
• Mock/Testdatasetsareacceptableaslongastheydonotfavoracertaintechnique
• Ideally,thestudyshouldevaluatethevisualizationtechniqueonacoupleofreal‐worlddatasets
• Provideslegitimacytotheresults
Beforeyoustartthestudy• Identifythetasksthatusers/expertswoulddoonaregularbasis– Trytokeepthetasksgeneralandreasonablyshort
• Createfair visualizationsofthedataforallthevisualizationtechniquesinvolved
• Identify/Solicitunbiasedparticipantsforthestudy– Attempttohaveabalancedgender,agepoolforthestudy
– Forexpertstudies,identifyappropriateparticipants
Beforeyoustartthestudy
• Inadditiontoverballydescribingthestudyanditspurpose,haveawrittendescriptionofthereadyfortheparticipants– Providesperspectivetotheparticipants– Thisallowsparticipantstoleaveiftheyfeeluncomfortablewiththestudy
• Makesureyouhavethequestionnairereadyforuserstofilloutafterthestudyiscompleted
• Ensurethatyouhavewaystorecorddataabouttheparticipantsbeforethestudycommences
EthicalConsiderations
• Sometimestestscanbedistressingandtiring• Ensurethatyouareclearlycommunicatingtotheparticipantsthat– Theycanleaveatanytime– Theyarenotpressuredtoparticipate– Thedatacollectedwillbeanonymous
• EveryuserstudyhastobeapprovedbyanInstitutionalReviewBoard
CapturingUserData
• Automaticallyrecordingperformancemetrics• Requestingparticipantsto‘thinkaloud’
– Telluswhattheyaretryingtodo– Telluswhatquestionsariseastheywork– Telluswhatproblemsarearisingastheywork
• Videorecordingtheparticipantsstudy• Recordingon‐screenactivity
PilotEvaluation• Beforeyoustartthestudy,itisalwaysagoodideato
– Takethestudyyourselftoidentifypotentialproblemswiththestudy
– Performapilotevaluationwith2/3participantstomakesurethetasks,visualizationsanddescriptionisnotambiguous
• Invariablyendupfindingproblemsandallowsyoutofixthembeforeyoustartthestudy
Conductingtheuserstudy
1. Welcomeandintroduceyourselftotheparticipant
2. Explaintheentireprocesstotheparticipant3. Clearlyindicatetotheparticipantthats/heis
allowedtoleavethestudyatanytime4. Allowtheparticipanttoreadthewritten
descriptionandobtainconsentfromtheparticipant
5. Introducetheparticipanttothebasicsofthestudy
Conductingtheuserstudy
6. Informtheparticipantthatyouwillbemeasuringmetrics(time,accuracyetc.)
7. Patientlyanswerallthequestionsthattheymayhave
8. Lettheparticipantstartthestudy9. Oncethestudyhasbeencompleted,requestthe
participantstofilloutthequestionnaire10.Thanktheparticipantforcompletingthestudy
OrderingEffects
• Orderingofconditionsisavariablethatcanconfoundtheresults
• Randomization– Toensurethatthereisanorderingeffectdoesnotaffecttheoutcomeofthestudy
• Control– Toensurethatavariableisheldconstantforallcases– SameDatasetsformultiplevisualizationtechniques
OrderingSubjects• Twotypesoforderingtheparticipants• Within‐Subjectsdesign
– Allparticipantsseeallvisualizationtechniques– Orderingandfatigueeffects
• Between‐Subjectsdesign– Onesetofparticipantsseesonetechniqueandperformstaskspertainingtothattechniques
– Cannotisolateeffectsduetoindividualdifferences– Requiresmoreparticipants
MeasuringEffectiveness
• Performancemetricscanprovideinsight– Consistentlyaccurateresultsusingatechnique– Significantlyfasterusingatechnique– Higherconfidencewhenusingatechnique
• Donotcombinetimewith‘thinkaloud’– Talkingwillaffectspeed
• Caveats– Fasterisnotalwaysbetter– Higherconfidence/speedbutlessaccurate?
PieChartsvs BarGraphs
• Completethestudybasedontheinstructionsgiven– 10mins
• Turninyourstudywhenyouaredone
Quantitativeanalysis
• Identifystatisticalsignificanceforperformancemetrics
• Studentt‐testcomputesstatisticalsignificanceiftwoquantitiesarebeingcompared
• ANOVA(AnalysisofVariance)computesstatisticalsignificanceifmorethantwoquantitiesarebeingcompared
• Outputintheformofp‐value– P<0.05=statisticallysignificant– P<0.01=highlystatisticallysignificant
QualitativeAnalysis
• ‘Thinkaloud’feedbackuseful• AnswerstotheQuestionnaireprovidecrucialfeedbackaboutpreferences,difficultieswithacertaintechnique,ideasforimprovementetc.
• UseFeedbackwithQuantitativeresults– Todeterminewhichtechniqueis‘better’– Makealistofpositiveandnegativeevents– Refinethetechnique/productforasecondroundofstudies
ReportyourResults
• Reportyourresultsinfulldetail– Researchpaper– Marketingbrochure
• Providesufficientdetailtoallowanotherteamtoconductthesamestudy
• Resultsprovidealegitimacytoyourclaimsofamoreeffectivevisualizationtechnique/system– Example:YostandNorth,Laidlawetal.andsoon
IssuesofPerceptionandCognition
• Adventofhugedisplaysandcomputingpoweroverwhelmingourperceptualsenses
• Visualizinghugemultimodaldatasetsonthesedisplaysmaycauseperceptualandcognitiveoverload
• Needtoevaluateanddesignvisualizationsthatareeffectiveandeasytounderstand
Nowforsomethingcompletelydifferent!
• PixarShort– Lifted– http://www.youtube.com/watch?v=pY1_HrhwaXU
UserStudies• RobertKosara,ChristopherG.Healey,VictoriaInterrante,DavidH.Laidlaw,andColinWare.UserStudies:Why,How,andWhen?.IEEEComput.Graph.Appl.23,4(July2003),20‐25.
• Increasetheawarenessoftheneedforuserevaluationinvisualization
• Discussscenarioswhereuserstudiesmaybeappropriateandsituationswhereothertechniquesmaybemoreappropriate(videotapingusersusingthesystem,etc.)
Blogcomments
• Danny– “colorsequencesportionwasextremelyhelpfulandisnotasemphasizedasitshouldbeindesign”
• Peter– “thetestsubjectmustbepresentedwithacleartask”
• Tim– “biggerconcernisthatsuchuserevaluationscouldeasilyyieldmisleadingresults…andthatisfarworsethanjustresultsof“limitedvalue”.
OtherEvaluationMethods
• Expertevaluation– Evaluatewiththehelpofasmallgroupofdomainexperts
• Interview‐basedevaluation(ManyEyes paper)• Multi‐dimensionalIn‐depthLong‐termCasestudies(MILC)
Multi‐dimensionalIn‐depthLong‐termCasestudies(MILC)
• BenShneiderman,CatherinePlaisant,Strategiesforevaluatinginformationvisualizationtools:multi‐dimensionalin‐depthlong‐termcasestudies,BELIV'06.
• Assessinginsight,discoveryishard• Efficacycanbeassessedbydocumenting
1. usage(observations,interviews,surveys,loggingetc.)2. expertusers’successinachievingtheirgoals
Blogcomments
• Peter– “ multipledimensionsandlong‐termnatureofthestudy…decreasingthelikelihoodofresultsbeinginfluencedbybeliefsystems”
• Archana – “appreciatetheauthorsforacceptingthatMILCisnot“the”strategyanditalsohassomedrawbacks”
• Tim– “Iliked theguidelineslaidoutinSection6…particularly,thoseregarding theincorporationoffeedbackfromwithinthetoolitself”
ChallengeofInformationVisualizationEvaluation
• Plaisant,C.TheChallengeofInformationVisualizationEvaluation.AdvancedVisualinterfaces,Italy,2004,ACMPress.
• Discussotherevaluationapproachesthattakeintoaccountthelongexploratorynatureofuserstasks
• Addresstheproblemsassociatedwithevaluatingbenefits
InformationVisualizationEvaluation
• “Discoveryisseldomaninstantaneousevent”– Observeusersoveralargerperiodinsteadofashortuserstudywithasingletool
• “Waytoanswerquestionsyoudidn’tknowyouhad”– Allowfreeexplorationofdataandreportonwhattheylearnedandunderstood(lesseffectivewithnon‐experts)
– Lettingsubjectsusetheirowndatacanincreaseengagementandinterest
InformationVisualizationEvaluation
• “Factoringinthechancesofdiscoveryandthebenefitsofawareness”– Potentialadoptershavetoconsiderrisksassociatedwitherrorscausedduetovisualization
– Streamliningrepetitivetasksisalsoabenefitthatadopterslikeinplaceofnoveldiscovery
• “Successcannotbetrackedbacktovisualization”– Freire andSilvaproposetheuseofprovenancetotrackusertasksand“quantify”insight
ExamplesofTechTransfer
From treemap to the Map of the Market and other treemap tools
Blogcomments
• Eddie– “ needtobalancethegeneralapplicationofatoolwithrequirementstosolveproblemsinspecificdomains”
• Danny– “Buildinggenericandreusabletoolkitsshouldbeencouraged,buttheclient‐facinginterfaceshouldbeverytailoredtotheproblemtheyaretryingtosolve”
• Tim– “neveroccurredtomejusthowharditcanbetofindearlyadopters”
ExpertReviews
• Tory,M.,MoellerT.EvaluatingVisualizations:DoExpertReviewsWork?IEEEComputerGraphicsandApplications,25(5),2005,8‐11.
• Askingafewfriendsfortheiropinionisn'tsufficientandmaymissvaluableinformation
• Expertreviewsidentifyusabilityproblemsandareefficient(5expertsfind75%problems)– Comparedtoastudyof50participants
Comparinglightcontrolwidgets
• HeuristicevaluationthatfocusesonGUIissues,genericvisualizationtasksandtasksspecifictotheirtool
• Lightdial(rightimage)facilitatedfasterexploration• Slidersbetterforunderstandinglightcontributions
Comparinglightcontrolwidgets
• Usabilityexpertshadfarmoreinsight• Oftheexperts,onlytwofocusedondataanalysistasks– Duetotheirtraininginmedicalimaging
• Involvingusabilityexpertswithdomainknowledgeismoreimportantthanjustusabilityexperts
Comparingvolumerenderinginterfaces
• Expertreviewsontwovolumerenderinginterfaces
• Tableinterfacevs parallel‐coordinatesstyleinterface
• Usedusabilityexpertswithdomainknowledgeforthisstudy
TableInterface
Image credits: TJ Jankun Kelly and Kwan‐Liu Ma
Parallel‐CoordinatesStyleInterface
Image credits: M. Tory and T. Moeller
Twotasks
1. Exploreseveraldatasets2. Searchforanidentifiableobjectusingthe
interfaces
• Expertsprovidedwrittenfeedbackalongwithratings
• Observedparticipantsandrecordedtheiropinionthroughoutthestudy– Ledtoidentificationofmisconceptions
Comparingvolumerenderinginterfaces
• Tableinterface– Usefulforquickexplorationofsettings
• Parallelcoordinatesinterface– Identifyingavailabledisplayoptionsandmanipulatingdisplaysettings
• Smallimageswereseenasaproblem
Summary
• Recordingandanalyzingtheobservationsishard– Explorevideorecording
• Highlyvaluableandapplicablefeedback• Expertsprovidequickandvaluableinsight• Notasubstituteforuserstudies• “Problems”foundbyexpertsmaynotaffectendusers
• Expertswithusabilityexperiencecanprovideapplicablefeedback
Blogcomments
• Danny– “judgesoftwarepoorlyatfirstglanceife.g.thewidgettoolkitorwindowdecorationsappearedoldorblock”
• Tim– “providedsomeguidelinesregardingwhenitmakessensetoacquireexpertreviews(earlyon)andwhen ”
• Bill– “ …inclinedtodoanexpertreviewinpreferencetoalabstudywhiledevelopingaproduct.”
SevenScenarios‐ InfoVis Evaluation
• HeidiLam,EnricoBertini,PetraIsenberg,CatherinePlaisant,Sheelagh Carpendale,EmpiricalStudiesinInformationVisualization:SevenScenarios,TVCG,December2011.
• ProvideadescriptiveviewofthevariousevaluationscenariosinInformationVisualization
• Surveyed850paperstocategorizethemintosevendistinctclasses(scenarios)
Scenarios
• Identifyyouruserevaluationgoals• Provideaninsightintothekindsofquestionsthatcanbeaskedforspecificscenarios
• Eachscenariocontainsadescriptionofthesituationinwhichyoucouldusespecifiedquestionsandmethodstoevaluateresults
Scenarios1. UnderstandingEnvironmentsandWorkPractices
(UWP)– MILCfallsintothiscategory– Spendtimeatthesiteoftheclienttryingto
understandthekindsoftasks,visualizationsandbarriersbeingused
2. EvaluatingVisualDataAnalysisandReasoning(VDAR)
– Studythedataanalysisorientedtasks(hypothesisgeneration,knowledgediscovery,decisionmaking)
Scenarios3. EvaluatingCommunicationthroughVisualization
(CTV)– Conveyinginformationtoalargegroupofpeople– Aimtoevaluateefficacy,usefulnessandinteraction
patterns4. EvaluatingCollaborativeDataAnalysis(CDA)
– Evaluatetheabilitytofacilitateseamlesscollaboration5. EvaluatingUserPerformance(UP)
– Identifylimitsofhumanperception,comparedvisualizationtechniquesbyexaminingperformance
Scenarios
6. EvaluatingUserExperience(UE)– Usabilitytestingintermsofusefulfeatures,
missingfeatures,improvements,learningcurve7. EvaluatingVisualizationAlgorithms(VA)
– Evaluatetheefficacyofalgorithmsinbeingabletohighlightpatterns,produceleastclutteredview,abilitytoscale(withlargedisplays,hugedatasets)
– Examinetheperformanceintermsofefficiencyofanalgorithm
Many‐to‐manymapping
• Scenariosdon’tmakedirectlytoasingleevaluationmethod
• Situationswhereyourevaluationgoalsmaybebroadmayincludemultiplescenarios– ExploringdataanalystsprocessofknowledgediscoverymayincludeUWP,VDAR,UE,VAandsoon
Blogcomments
• Peter– “ …importanceofeffectiveevaluation,andbycontrast,theproblemscreatedbyineffectiveevaluation.”
• Danny– “Thepaperdoesdoagoodjobindescribingthederivedcategoriesandexplainshowtheyaredifferent.”
• Eddie– “providesahandyframeworkwithinwhichtocategorizeevaluationscenariostoaidintheselectionof theproper evaluationmethod”
Userstudy:2DVectorVisualization
• D.H.Laidlaw,M.Kirby,C.Jackson,J.S.Davidson,T.Miller,M.DaSilva,W.Warren,andM.Tarr (2005).Comparing2Dvectorfieldvisualizationmethods:Auserstudy.TVCG,11(1):59‐70,2005.
• Conductedathoroughevaluationofexpertandnon‐expertusersforvisualizationtechniques
• Comparedsixvisualizationmethodsfor2Dvectorvisualization
• PresentationbyTimManess