User Evaluation

57
User Evaluation Alark Joshi

Transcript of User Evaluation

Page 1: User Evaluation

UserEvaluation

AlarkJoshi

Page 2: User Evaluation

OpticalIllusions

Same Length

Same Color perceived differently

Different Color perceived as same

Page 3: User Evaluation

MotionInducedBlindness• Fixateonthecenter– Theyellowspotsdisappearonceinawhile(sometimes1andsometimesall3)

http://www.michaelbach.de/ot/mot_mib/index.html

Page 4: User Evaluation

Inattentional Blindness

http://www.youtube.com/watch?v=Ahg6qcgoay4

Page 5: User Evaluation

PreattentiveProcessing

Closure

SizeOrientationShape and Color

ColorShape

Credits: http://www.csc.ncsu.edu/faculty/healey/PP/

Page 6: User Evaluation

NeedforEvaluation• Visualsystemisextremelycomplex• Misrepresentationofdatacanhavecatastrophicresultsinsomecases– Challengerdisastercouldhavebeenaverted‐ Tufte

• Needtoensureconsistentinterpretationofdatausingavisualizationtechniqueinvariousscenarios

• “Userstudiesofferascientificallysoundmethodtomeasureavisualizationtechnique’sperformance”– Kosara etal.

Page 7: User Evaluation

NeedforEvaluation

• Novelvisualizationtechniquesprovideauniquerepresentationofdata

• Howdoyoumeasuretheeffectivenessofthetechnique?

• Howcanwebesurethatthenewtechniqueisnotmisrepresenting/misleadingtheuser?

• Howcanweconfidentlysaythatyourvisualizationtechniqueis‘better’thanexistingtechniques?

Page 8: User Evaluation

UserEvaluations

• Conductauserstudytoevaluatewhichvisualizationtechniqueis‘better’

• Auserstudyinvolvesusersperformingtasksusingavarietyofvisualizationtechniques

• Theideaistoevaluatethevisualizationtechniquesbasedontheperformanceoftheusers– Forexample,YostandNorth’06(ThePerceptualScalabilityofVisualization)

• Howdoyoumeasuretheperformance ofauser?

Page 9: User Evaluation

UserStudy

• ComparePieChartsandBarGraphs• Learnaboutthecomponentsofauserstudy• Fillouttheworksheetaswegoalongwithwhatyouthinkeachcomponentis

• Weshallthenconducttheuserstudywithallofyouas‘willing’participantsattheend

• Analyzetheresultsafterclass

Page 10: User Evaluation

DependentVariables

• Howdoyoumeasuretheperformance ofauser?• Performancemetrics

– UserAccuracyatperformingthetasks– Timerequiredtoperformthetasks– Emotionalresponse

• Confidencethattheusers’haveintheiranswers(whichindicatestheeffectivenessofatechnique)

• Stresscausedbyaparticulartechnique– Qualitativefeedbackintheformofaquestionnairethatletstheuserprovideadditionalfeedback

Page 11: User Evaluation

ConductingaUserStudy

• Whatareyoutesting?– Efficacyofanovelvisualizationtechnique– Evaluatingpre‐existingtechniquestofindthebestoneforacertaintask

• IdentifyyourNullHypothesis– Forexample,theNullHypothesiswouldbethatthatthenovelvisualizationtechniqueisnotmoreeffectivethancurrentlyusedtechniques

Page 12: User Evaluation

IndependentVariables

• Whatarethevariablesinthestudythatareindependent?

• Variablesthatwemanipulateinthestudy• Thevisualizationtechniquesthatyoushowtheuserareyourindependentvariables– E.g.YouhavethreeindependentvariablesifyouarecomparingTreemaps vs BarChartsvs PieCharts

Page 13: User Evaluation

Datasets

• Thedatasetschosenforthestudyshouldbestandardandanonymous

• Mock/Testdatasetsareacceptableaslongastheydonotfavoracertaintechnique

• Ideally,thestudyshouldevaluatethevisualizationtechniqueonacoupleofreal‐worlddatasets

• Provideslegitimacytotheresults

Page 14: User Evaluation

Beforeyoustartthestudy• Identifythetasksthatusers/expertswoulddoonaregularbasis– Trytokeepthetasksgeneralandreasonablyshort

• Createfair visualizationsofthedataforallthevisualizationtechniquesinvolved

• Identify/Solicitunbiasedparticipantsforthestudy– Attempttohaveabalancedgender,agepoolforthestudy

– Forexpertstudies,identifyappropriateparticipants

Page 15: User Evaluation

Beforeyoustartthestudy

• Inadditiontoverballydescribingthestudyanditspurpose,haveawrittendescriptionofthereadyfortheparticipants– Providesperspectivetotheparticipants– Thisallowsparticipantstoleaveiftheyfeeluncomfortablewiththestudy

• Makesureyouhavethequestionnairereadyforuserstofilloutafterthestudyiscompleted

• Ensurethatyouhavewaystorecorddataabouttheparticipantsbeforethestudycommences

Page 16: User Evaluation

EthicalConsiderations

• Sometimestestscanbedistressingandtiring• Ensurethatyouareclearlycommunicatingtotheparticipantsthat– Theycanleaveatanytime– Theyarenotpressuredtoparticipate– Thedatacollectedwillbeanonymous

• EveryuserstudyhastobeapprovedbyanInstitutionalReviewBoard

Page 17: User Evaluation

CapturingUserData

• Automaticallyrecordingperformancemetrics• Requestingparticipantsto‘thinkaloud’

– Telluswhattheyaretryingtodo– Telluswhatquestionsariseastheywork– Telluswhatproblemsarearisingastheywork

• Videorecordingtheparticipantsstudy• Recordingon‐screenactivity

Page 18: User Evaluation

PilotEvaluation• Beforeyoustartthestudy,itisalwaysagoodideato

– Takethestudyyourselftoidentifypotentialproblemswiththestudy

– Performapilotevaluationwith2/3participantstomakesurethetasks,visualizationsanddescriptionisnotambiguous

• Invariablyendupfindingproblemsandallowsyoutofixthembeforeyoustartthestudy

Page 19: User Evaluation

Conductingtheuserstudy

1. Welcomeandintroduceyourselftotheparticipant

2. Explaintheentireprocesstotheparticipant3. Clearlyindicatetotheparticipantthats/heis

allowedtoleavethestudyatanytime4. Allowtheparticipanttoreadthewritten

descriptionandobtainconsentfromtheparticipant

5. Introducetheparticipanttothebasicsofthestudy

Page 20: User Evaluation

Conductingtheuserstudy

6. Informtheparticipantthatyouwillbemeasuringmetrics(time,accuracyetc.)

7. Patientlyanswerallthequestionsthattheymayhave

8. Lettheparticipantstartthestudy9. Oncethestudyhasbeencompleted,requestthe

participantstofilloutthequestionnaire10.Thanktheparticipantforcompletingthestudy

Page 21: User Evaluation

OrderingEffects

• Orderingofconditionsisavariablethatcanconfoundtheresults

• Randomization– Toensurethatthereisanorderingeffectdoesnotaffecttheoutcomeofthestudy

• Control– Toensurethatavariableisheldconstantforallcases– SameDatasetsformultiplevisualizationtechniques

Page 22: User Evaluation

OrderingSubjects• Twotypesoforderingtheparticipants• Within‐Subjectsdesign

– Allparticipantsseeallvisualizationtechniques– Orderingandfatigueeffects

• Between‐Subjectsdesign– Onesetofparticipantsseesonetechniqueandperformstaskspertainingtothattechniques

– Cannotisolateeffectsduetoindividualdifferences– Requiresmoreparticipants

Page 23: User Evaluation

MeasuringEffectiveness

• Performancemetricscanprovideinsight– Consistentlyaccurateresultsusingatechnique– Significantlyfasterusingatechnique– Higherconfidencewhenusingatechnique

• Donotcombinetimewith‘thinkaloud’– Talkingwillaffectspeed

• Caveats– Fasterisnotalwaysbetter– Higherconfidence/speedbutlessaccurate?

Page 24: User Evaluation

PieChartsvs BarGraphs

• Completethestudybasedontheinstructionsgiven– 10mins

• Turninyourstudywhenyouaredone

Page 25: User Evaluation

Quantitativeanalysis

• Identifystatisticalsignificanceforperformancemetrics

• Studentt‐testcomputesstatisticalsignificanceiftwoquantitiesarebeingcompared

• ANOVA(AnalysisofVariance)computesstatisticalsignificanceifmorethantwoquantitiesarebeingcompared

• Outputintheformofp‐value– P<0.05=statisticallysignificant– P<0.01=highlystatisticallysignificant

Page 26: User Evaluation

QualitativeAnalysis

• ‘Thinkaloud’feedbackuseful• AnswerstotheQuestionnaireprovidecrucialfeedbackaboutpreferences,difficultieswithacertaintechnique,ideasforimprovementetc.

• UseFeedbackwithQuantitativeresults– Todeterminewhichtechniqueis‘better’– Makealistofpositiveandnegativeevents– Refinethetechnique/productforasecondroundofstudies

Page 27: User Evaluation

ReportyourResults

• Reportyourresultsinfulldetail– Researchpaper– Marketingbrochure

• Providesufficientdetailtoallowanotherteamtoconductthesamestudy

• Resultsprovidealegitimacytoyourclaimsofamoreeffectivevisualizationtechnique/system– Example:YostandNorth,Laidlawetal.andsoon

Page 28: User Evaluation

IssuesofPerceptionandCognition

• Adventofhugedisplaysandcomputingpoweroverwhelmingourperceptualsenses

• Visualizinghugemultimodaldatasetsonthesedisplaysmaycauseperceptualandcognitiveoverload

• Needtoevaluateanddesignvisualizationsthatareeffectiveandeasytounderstand

Page 29: User Evaluation

Nowforsomethingcompletelydifferent!

• PixarShort– Lifted– http://www.youtube.com/watch?v=pY1_HrhwaXU

Page 30: User Evaluation

UserStudies• RobertKosara,ChristopherG.Healey,VictoriaInterrante,DavidH.Laidlaw,andColinWare.UserStudies:Why,How,andWhen?.IEEEComput.Graph.Appl.23,4(July2003),20‐25.

• Increasetheawarenessoftheneedforuserevaluationinvisualization

• Discussscenarioswhereuserstudiesmaybeappropriateandsituationswhereothertechniquesmaybemoreappropriate(videotapingusersusingthesystem,etc.)

Page 31: User Evaluation

Blogcomments

• Danny– “colorsequencesportionwasextremelyhelpfulandisnotasemphasizedasitshouldbeindesign”

• Peter– “thetestsubjectmustbepresentedwithacleartask”

• Tim– “biggerconcernisthatsuchuserevaluationscouldeasilyyieldmisleadingresults…andthatisfarworsethanjustresultsof“limitedvalue”.

Page 32: User Evaluation

OtherEvaluationMethods

• Expertevaluation– Evaluatewiththehelpofasmallgroupofdomainexperts

• Interview‐basedevaluation(ManyEyes paper)• Multi‐dimensionalIn‐depthLong‐termCasestudies(MILC)

Page 33: User Evaluation

Multi‐dimensionalIn‐depthLong‐termCasestudies(MILC)

• BenShneiderman,CatherinePlaisant,Strategiesforevaluatinginformationvisualizationtools:multi‐dimensionalin‐depthlong‐termcasestudies,BELIV'06.

• Assessinginsight,discoveryishard• Efficacycanbeassessedbydocumenting

1. usage(observations,interviews,surveys,loggingetc.)2. expertusers’successinachievingtheirgoals

Page 34: User Evaluation

Blogcomments

• Peter– “ multipledimensionsandlong‐termnatureofthestudy…decreasingthelikelihoodofresultsbeinginfluencedbybeliefsystems”

• Archana – “appreciatetheauthorsforacceptingthatMILCisnot“the”strategyanditalsohassomedrawbacks”

• Tim– “Iliked theguidelineslaidoutinSection6…particularly,thoseregarding theincorporationoffeedbackfromwithinthetoolitself”

Page 35: User Evaluation

ChallengeofInformationVisualizationEvaluation

• Plaisant,C.TheChallengeofInformationVisualizationEvaluation.AdvancedVisualinterfaces,Italy,2004,ACMPress.

• Discussotherevaluationapproachesthattakeintoaccountthelongexploratorynatureofuserstasks

• Addresstheproblemsassociatedwithevaluatingbenefits

Page 36: User Evaluation

InformationVisualizationEvaluation

• “Discoveryisseldomaninstantaneousevent”– Observeusersoveralargerperiodinsteadofashortuserstudywithasingletool

• “Waytoanswerquestionsyoudidn’tknowyouhad”– Allowfreeexplorationofdataandreportonwhattheylearnedandunderstood(lesseffectivewithnon‐experts)

– Lettingsubjectsusetheirowndatacanincreaseengagementandinterest

Page 37: User Evaluation

InformationVisualizationEvaluation

• “Factoringinthechancesofdiscoveryandthebenefitsofawareness”– Potentialadoptershavetoconsiderrisksassociatedwitherrorscausedduetovisualization

– Streamliningrepetitivetasksisalsoabenefitthatadopterslikeinplaceofnoveldiscovery

• “Successcannotbetrackedbacktovisualization”– Freire andSilvaproposetheuseofprovenancetotrackusertasksand“quantify”insight

Page 38: User Evaluation

ExamplesofTechTransfer

From treemap to the Map of the Market and other treemap tools

Page 39: User Evaluation

Blogcomments

• Eddie– “ needtobalancethegeneralapplicationofatoolwithrequirementstosolveproblemsinspecificdomains”

• Danny– “Buildinggenericandreusabletoolkitsshouldbeencouraged,buttheclient‐facinginterfaceshouldbeverytailoredtotheproblemtheyaretryingtosolve”

• Tim– “neveroccurredtomejusthowharditcanbetofindearlyadopters”

Page 40: User Evaluation

ExpertReviews

• Tory,M.,MoellerT.EvaluatingVisualizations:DoExpertReviewsWork?IEEEComputerGraphicsandApplications,25(5),2005,8‐11.

• Askingafewfriendsfortheiropinionisn'tsufficientandmaymissvaluableinformation

• Expertreviewsidentifyusabilityproblemsandareefficient(5expertsfind75%problems)– Comparedtoastudyof50participants

Page 41: User Evaluation

Comparinglightcontrolwidgets

• HeuristicevaluationthatfocusesonGUIissues,genericvisualizationtasksandtasksspecifictotheirtool

• Lightdial(rightimage)facilitatedfasterexploration• Slidersbetterforunderstandinglightcontributions

Page 42: User Evaluation

Comparinglightcontrolwidgets

• Usabilityexpertshadfarmoreinsight• Oftheexperts,onlytwofocusedondataanalysistasks– Duetotheirtraininginmedicalimaging

• Involvingusabilityexpertswithdomainknowledgeismoreimportantthanjustusabilityexperts

Page 43: User Evaluation

Comparingvolumerenderinginterfaces

• Expertreviewsontwovolumerenderinginterfaces

• Tableinterfacevs parallel‐coordinatesstyleinterface

• Usedusabilityexpertswithdomainknowledgeforthisstudy

Page 44: User Evaluation

TableInterface

Image credits: TJ Jankun Kelly and Kwan‐Liu Ma

Page 45: User Evaluation

Parallel‐CoordinatesStyleInterface

Image credits: M. Tory and T. Moeller

Page 46: User Evaluation

Twotasks

1. Exploreseveraldatasets2. Searchforanidentifiableobjectusingthe

interfaces

• Expertsprovidedwrittenfeedbackalongwithratings

• Observedparticipantsandrecordedtheiropinionthroughoutthestudy– Ledtoidentificationofmisconceptions

Page 47: User Evaluation

Comparingvolumerenderinginterfaces

• Tableinterface– Usefulforquickexplorationofsettings

• Parallelcoordinatesinterface– Identifyingavailabledisplayoptionsandmanipulatingdisplaysettings

• Smallimageswereseenasaproblem

Page 48: User Evaluation

Summary

• Recordingandanalyzingtheobservationsishard– Explorevideorecording

• Highlyvaluableandapplicablefeedback• Expertsprovidequickandvaluableinsight• Notasubstituteforuserstudies• “Problems”foundbyexpertsmaynotaffectendusers

• Expertswithusabilityexperiencecanprovideapplicablefeedback

Page 49: User Evaluation

Blogcomments

• Danny– “judgesoftwarepoorlyatfirstglanceife.g.thewidgettoolkitorwindowdecorationsappearedoldorblock”

• Tim– “providedsomeguidelinesregardingwhenitmakessensetoacquireexpertreviews(earlyon)andwhen ”

• Bill– “ …inclinedtodoanexpertreviewinpreferencetoalabstudywhiledevelopingaproduct.”

Page 50: User Evaluation

SevenScenarios‐ InfoVis Evaluation

• HeidiLam,EnricoBertini,PetraIsenberg,CatherinePlaisant,Sheelagh Carpendale,EmpiricalStudiesinInformationVisualization:SevenScenarios,TVCG,December2011.

• ProvideadescriptiveviewofthevariousevaluationscenariosinInformationVisualization

• Surveyed850paperstocategorizethemintosevendistinctclasses(scenarios)

Page 51: User Evaluation

Scenarios

• Identifyyouruserevaluationgoals• Provideaninsightintothekindsofquestionsthatcanbeaskedforspecificscenarios

• Eachscenariocontainsadescriptionofthesituationinwhichyoucouldusespecifiedquestionsandmethodstoevaluateresults

Page 52: User Evaluation

Scenarios1. UnderstandingEnvironmentsandWorkPractices

(UWP)– MILCfallsintothiscategory– Spendtimeatthesiteoftheclienttryingto

understandthekindsoftasks,visualizationsandbarriersbeingused

2. EvaluatingVisualDataAnalysisandReasoning(VDAR)

– Studythedataanalysisorientedtasks(hypothesisgeneration,knowledgediscovery,decisionmaking)

Page 53: User Evaluation

Scenarios3. EvaluatingCommunicationthroughVisualization

(CTV)– Conveyinginformationtoalargegroupofpeople– Aimtoevaluateefficacy,usefulnessandinteraction

patterns4. EvaluatingCollaborativeDataAnalysis(CDA)

– Evaluatetheabilitytofacilitateseamlesscollaboration5. EvaluatingUserPerformance(UP)

– Identifylimitsofhumanperception,comparedvisualizationtechniquesbyexaminingperformance

Page 54: User Evaluation

Scenarios

6. EvaluatingUserExperience(UE)– Usabilitytestingintermsofusefulfeatures,

missingfeatures,improvements,learningcurve7. EvaluatingVisualizationAlgorithms(VA)

– Evaluatetheefficacyofalgorithmsinbeingabletohighlightpatterns,produceleastclutteredview,abilitytoscale(withlargedisplays,hugedatasets)

– Examinetheperformanceintermsofefficiencyofanalgorithm

Page 55: User Evaluation

Many‐to‐manymapping

• Scenariosdon’tmakedirectlytoasingleevaluationmethod

• Situationswhereyourevaluationgoalsmaybebroadmayincludemultiplescenarios– ExploringdataanalystsprocessofknowledgediscoverymayincludeUWP,VDAR,UE,VAandsoon

Page 56: User Evaluation

Blogcomments

• Peter– “ …importanceofeffectiveevaluation,andbycontrast,theproblemscreatedbyineffectiveevaluation.”

• Danny– “Thepaperdoesdoagoodjobindescribingthederivedcategoriesandexplainshowtheyaredifferent.”

• Eddie– “providesahandyframeworkwithinwhichtocategorizeevaluationscenariostoaidintheselectionof theproper evaluationmethod”

Page 57: User Evaluation

Userstudy:2DVectorVisualization

• D.H.Laidlaw,M.Kirby,C.Jackson,J.S.Davidson,T.Miller,M.DaSilva,W.Warren,andM.Tarr (2005).Comparing2Dvectorfieldvisualizationmethods:Auserstudy.TVCG,11(1):59‐70,2005.

• Conductedathoroughevaluationofexpertandnon‐expertusersforvisualizationtechniques

• Comparedsixvisualizationmethodsfor2Dvectorvisualization

• PresentationbyTimManess