User Evaluation

UserEvaluation

AlarkJoshi

OpticalIllusions

Same Length

Same Color perceived differently

Different Color perceived as same

MotionInducedBlindness• Fixateonthecenter– Theyellowspotsdisappearonceinawhile(sometimes1andsometimesall3)

http://www.michaelbach.de/ot/mot_mib/index.html

Inattentional Blindness

http://www.youtube.com/watch?v=Ahg6qcgoay4

PreattentiveProcessing

Closure

SizeOrientationShape and Color

ColorShape

Credits: http://www.csc.ncsu.edu/faculty/healey/PP/

NeedforEvaluation• Visualsystemisextremelycomplex• Misrepresentationofdatacanhavecatastrophicresultsinsomecases– Challengerdisastercouldhavebeenaverted‐ Tufte

• Needtoensureconsistentinterpretationofdatausingavisualizationtechniqueinvariousscenarios

• “Userstudiesofferascientificallysoundmethodtomeasureavisualizationtechnique’sperformance”– Kosara etal.

NeedforEvaluation

• Novelvisualizationtechniquesprovideauniquerepresentationofdata

• Howdoyoumeasuretheeffectivenessofthetechnique?

• Howcanwebesurethatthenewtechniqueisnotmisrepresenting/misleadingtheuser?

• Howcanweconfidentlysaythatyourvisualizationtechniqueis‘better’thanexistingtechniques?

UserEvaluations

• Conductauserstudytoevaluatewhichvisualizationtechniqueis‘better’

• Auserstudyinvolvesusersperformingtasksusingavarietyofvisualizationtechniques

• Theideaistoevaluatethevisualizationtechniquesbasedontheperformanceoftheusers– Forexample,YostandNorth’06(ThePerceptualScalabilityofVisualization)

• Howdoyoumeasuretheperformance ofauser?

UserStudy

• ComparePieChartsandBarGraphs• Learnaboutthecomponentsofauserstudy• Fillouttheworksheetaswegoalongwithwhatyouthinkeachcomponentis

• Weshallthenconducttheuserstudywithallofyouas‘willing’participantsattheend

• Analyzetheresultsafterclass

DependentVariables

• Howdoyoumeasuretheperformance ofauser?• Performancemetrics

– UserAccuracyatperformingthetasks– Timerequiredtoperformthetasks– Emotionalresponse

• Confidencethattheusers’haveintheiranswers(whichindicatestheeffectivenessofatechnique)

• Stresscausedbyaparticulartechnique– Qualitativefeedbackintheformofaquestionnairethatletstheuserprovideadditionalfeedback

ConductingaUserStudy

• Whatareyoutesting?– Efficacyofanovelvisualizationtechnique– Evaluatingpre‐existingtechniquestofindthebestoneforacertaintask

• IdentifyyourNullHypothesis– Forexample,theNullHypothesiswouldbethatthatthenovelvisualizationtechniqueisnotmoreeffectivethancurrentlyusedtechniques

IndependentVariables

• Whatarethevariablesinthestudythatareindependent?

• Variablesthatwemanipulateinthestudy• Thevisualizationtechniquesthatyoushowtheuserareyourindependentvariables– E.g.YouhavethreeindependentvariablesifyouarecomparingTreemaps vs BarChartsvs PieCharts

Datasets

• Thedatasetschosenforthestudyshouldbestandardandanonymous

• Mock/Testdatasetsareacceptableaslongastheydonotfavoracertaintechnique

• Ideally,thestudyshouldevaluatethevisualizationtechniqueonacoupleofreal‐worlddatasets

• Provideslegitimacytotheresults

Beforeyoustartthestudy• Identifythetasksthatusers/expertswoulddoonaregularbasis– Trytokeepthetasksgeneralandreasonablyshort

• Createfair visualizationsofthedataforallthevisualizationtechniquesinvolved

• Identify/Solicitunbiasedparticipantsforthestudy– Attempttohaveabalancedgender,agepoolforthestudy

– Forexpertstudies,identifyappropriateparticipants

Beforeyoustartthestudy

• Inadditiontoverballydescribingthestudyanditspurpose,haveawrittendescriptionofthereadyfortheparticipants– Providesperspectivetotheparticipants– Thisallowsparticipantstoleaveiftheyfeeluncomfortablewiththestudy

• Makesureyouhavethequestionnairereadyforuserstofilloutafterthestudyiscompleted

• Ensurethatyouhavewaystorecorddataabouttheparticipantsbeforethestudycommences

EthicalConsiderations

• Sometimestestscanbedistressingandtiring• Ensurethatyouareclearlycommunicatingtotheparticipantsthat– Theycanleaveatanytime– Theyarenotpressuredtoparticipate– Thedatacollectedwillbeanonymous

• EveryuserstudyhastobeapprovedbyanInstitutionalReviewBoard

CapturingUserData

• Automaticallyrecordingperformancemetrics• Requestingparticipantsto‘thinkaloud’

– Telluswhattheyaretryingtodo– Telluswhatquestionsariseastheywork– Telluswhatproblemsarearisingastheywork

• Videorecordingtheparticipantsstudy• Recordingon‐screenactivity

PilotEvaluation• Beforeyoustartthestudy,itisalwaysagoodideato

– Takethestudyyourselftoidentifypotentialproblemswiththestudy

– Performapilotevaluationwith2/3participantstomakesurethetasks,visualizationsanddescriptionisnotambiguous

• Invariablyendupfindingproblemsandallowsyoutofixthembeforeyoustartthestudy

Conductingtheuserstudy

1. Welcomeandintroduceyourselftotheparticipant

2. Explaintheentireprocesstotheparticipant3. Clearlyindicatetotheparticipantthats/heis

allowedtoleavethestudyatanytime4. Allowtheparticipanttoreadthewritten

descriptionandobtainconsentfromtheparticipant

5. Introducetheparticipanttothebasicsofthestudy

Conductingtheuserstudy

6. Informtheparticipantthatyouwillbemeasuringmetrics(time,accuracyetc.)

7. Patientlyanswerallthequestionsthattheymayhave

8. Lettheparticipantstartthestudy9. Oncethestudyhasbeencompleted,requestthe

participantstofilloutthequestionnaire10.Thanktheparticipantforcompletingthestudy

OrderingEffects

• Orderingofconditionsisavariablethatcanconfoundtheresults

• Randomization– Toensurethatthereisanorderingeffectdoesnotaffecttheoutcomeofthestudy

• Control– Toensurethatavariableisheldconstantforallcases– SameDatasetsformultiplevisualizationtechniques

OrderingSubjects• Twotypesoforderingtheparticipants• Within‐Subjectsdesign

– Allparticipantsseeallvisualizationtechniques– Orderingandfatigueeffects

• Between‐Subjectsdesign– Onesetofparticipantsseesonetechniqueandperformstaskspertainingtothattechniques

– Cannotisolateeffectsduetoindividualdifferences– Requiresmoreparticipants

MeasuringEffectiveness

• Performancemetricscanprovideinsight– Consistentlyaccurateresultsusingatechnique– Significantlyfasterusingatechnique– Higherconfidencewhenusingatechnique

• Donotcombinetimewith‘thinkaloud’– Talkingwillaffectspeed

• Caveats– Fasterisnotalwaysbetter– Higherconfidence/speedbutlessaccurate?

PieChartsvs BarGraphs

• Completethestudybasedontheinstructionsgiven– 10mins

• Turninyourstudywhenyouaredone

Quantitativeanalysis

• Identifystatisticalsignificanceforperformancemetrics

• Studentt‐testcomputesstatisticalsignificanceiftwoquantitiesarebeingcompared

• ANOVA(AnalysisofVariance)computesstatisticalsignificanceifmorethantwoquantitiesarebeingcompared

• Outputintheformofp‐value– P<0.05=statisticallysignificant– P<0.01=highlystatisticallysignificant

QualitativeAnalysis

• ‘Thinkaloud’feedbackuseful• AnswerstotheQuestionnaireprovidecrucialfeedbackaboutpreferences,difficultieswithacertaintechnique,ideasforimprovementetc.

• UseFeedbackwithQuantitativeresults– Todeterminewhichtechniqueis‘better’– Makealistofpositiveandnegativeevents– Refinethetechnique/productforasecondroundofstudies

ReportyourResults

• Reportyourresultsinfulldetail– Researchpaper– Marketingbrochure

• Providesufficientdetailtoallowanotherteamtoconductthesamestudy

• Resultsprovidealegitimacytoyourclaimsofamoreeffectivevisualizationtechnique/system– Example:YostandNorth,Laidlawetal.andsoon

IssuesofPerceptionandCognition

• Adventofhugedisplaysandcomputingpoweroverwhelmingourperceptualsenses

• Visualizinghugemultimodaldatasetsonthesedisplaysmaycauseperceptualandcognitiveoverload

• Needtoevaluateanddesignvisualizationsthatareeffectiveandeasytounderstand

Nowforsomethingcompletelydifferent!

• PixarShort– Lifted– http://www.youtube.com/watch?v=pY1_HrhwaXU

UserStudies• RobertKosara,ChristopherG.Healey,VictoriaInterrante,DavidH.Laidlaw,andColinWare.UserStudies:Why,How,andWhen?.IEEEComput.Graph.Appl.23,4(July2003),20‐25.

• Increasetheawarenessoftheneedforuserevaluationinvisualization

• Discussscenarioswhereuserstudiesmaybeappropriateandsituationswhereothertechniquesmaybemoreappropriate(videotapingusersusingthesystem,etc.)

Blogcomments

• Danny– “colorsequencesportionwasextremelyhelpfulandisnotasemphasizedasitshouldbeindesign”

• Peter– “thetestsubjectmustbepresentedwithacleartask”

• Tim– “biggerconcernisthatsuchuserevaluationscouldeasilyyieldmisleadingresults…andthatisfarworsethanjustresultsof“limitedvalue”.

OtherEvaluationMethods

• Expertevaluation– Evaluatewiththehelpofasmallgroupofdomainexperts

• Interview‐basedevaluation(ManyEyes paper)• Multi‐dimensionalIn‐depthLong‐termCasestudies(MILC)

Multi‐dimensionalIn‐depthLong‐termCasestudies(MILC)

• BenShneiderman,CatherinePlaisant,Strategiesforevaluatinginformationvisualizationtools:multi‐dimensionalin‐depthlong‐termcasestudies,BELIV'06.

• Assessinginsight,discoveryishard• Efficacycanbeassessedbydocumenting

1. usage(observations,interviews,surveys,loggingetc.)2. expertusers’successinachievingtheirgoals

Blogcomments

• Peter– “ multipledimensionsandlong‐termnatureofthestudy…decreasingthelikelihoodofresultsbeinginfluencedbybeliefsystems”

• Archana – “appreciatetheauthorsforacceptingthatMILCisnot“the”strategyanditalsohassomedrawbacks”

• Tim– “Iliked theguidelineslaidoutinSection6…particularly,thoseregarding theincorporationoffeedbackfromwithinthetoolitself”

ChallengeofInformationVisualizationEvaluation

• Plaisant,C.TheChallengeofInformationVisualizationEvaluation.AdvancedVisualinterfaces,Italy,2004,ACMPress.

• Discussotherevaluationapproachesthattakeintoaccountthelongexploratorynatureofuserstasks

• Addresstheproblemsassociatedwithevaluatingbenefits

InformationVisualizationEvaluation

• “Discoveryisseldomaninstantaneousevent”– Observeusersoveralargerperiodinsteadofashortuserstudywithasingletool

• “Waytoanswerquestionsyoudidn’tknowyouhad”– Allowfreeexplorationofdataandreportonwhattheylearnedandunderstood(lesseffectivewithnon‐experts)

– Lettingsubjectsusetheirowndatacanincreaseengagementandinterest

InformationVisualizationEvaluation

• “Factoringinthechancesofdiscoveryandthebenefitsofawareness”– Potentialadoptershavetoconsiderrisksassociatedwitherrorscausedduetovisualization

– Streamliningrepetitivetasksisalsoabenefitthatadopterslikeinplaceofnoveldiscovery

• “Successcannotbetrackedbacktovisualization”– Freire andSilvaproposetheuseofprovenancetotrackusertasksand“quantify”insight

ExamplesofTechTransfer

From treemap to the Map of the Market and other treemap tools

Blogcomments

• Eddie– “ needtobalancethegeneralapplicationofatoolwithrequirementstosolveproblemsinspecificdomains”

• Danny– “Buildinggenericandreusabletoolkitsshouldbeencouraged,buttheclient‐facinginterfaceshouldbeverytailoredtotheproblemtheyaretryingtosolve”

• Tim– “neveroccurredtomejusthowharditcanbetofindearlyadopters”

ExpertReviews

• Tory,M.,MoellerT.EvaluatingVisualizations:DoExpertReviewsWork?IEEEComputerGraphicsandApplications,25(5),2005,8‐11.

• Askingafewfriendsfortheiropinionisn'tsufficientandmaymissvaluableinformation

• Expertreviewsidentifyusabilityproblemsandareefficient(5expertsfind75%problems)– Comparedtoastudyof50participants

Comparinglightcontrolwidgets

• HeuristicevaluationthatfocusesonGUIissues,genericvisualizationtasksandtasksspecifictotheirtool

• Lightdial(rightimage)facilitatedfasterexploration• Slidersbetterforunderstandinglightcontributions

Comparinglightcontrolwidgets

• Usabilityexpertshadfarmoreinsight• Oftheexperts,onlytwofocusedondataanalysistasks– Duetotheirtraininginmedicalimaging

• Involvingusabilityexpertswithdomainknowledgeismoreimportantthanjustusabilityexperts

Comparingvolumerenderinginterfaces

• Expertreviewsontwovolumerenderinginterfaces

• Tableinterfacevs parallel‐coordinatesstyleinterface

• Usedusabilityexpertswithdomainknowledgeforthisstudy

TableInterface

Image credits: TJ Jankun Kelly and Kwan‐Liu Ma

Parallel‐CoordinatesStyleInterface

Image credits: M. Tory and T. Moeller

Twotasks

1. Exploreseveraldatasets2. Searchforanidentifiableobjectusingthe

interfaces

• Expertsprovidedwrittenfeedbackalongwithratings

• Observedparticipantsandrecordedtheiropinionthroughoutthestudy– Ledtoidentificationofmisconceptions

Comparingvolumerenderinginterfaces

• Tableinterface– Usefulforquickexplorationofsettings

• Parallelcoordinatesinterface– Identifyingavailabledisplayoptionsandmanipulatingdisplaysettings

• Smallimageswereseenasaproblem

Summary

• Recordingandanalyzingtheobservationsishard– Explorevideorecording

• Highlyvaluableandapplicablefeedback• Expertsprovidequickandvaluableinsight• Notasubstituteforuserstudies• “Problems”foundbyexpertsmaynotaffectendusers

• Expertswithusabilityexperiencecanprovideapplicablefeedback

Blogcomments

• Danny– “judgesoftwarepoorlyatfirstglanceife.g.thewidgettoolkitorwindowdecorationsappearedoldorblock”

• Tim– “providedsomeguidelinesregardingwhenitmakessensetoacquireexpertreviews(earlyon)andwhen ”

• Bill– “ …inclinedtodoanexpertreviewinpreferencetoalabstudywhiledevelopingaproduct.”

SevenScenarios‐ InfoVis Evaluation

• HeidiLam,EnricoBertini,PetraIsenberg,CatherinePlaisant,Sheelagh Carpendale,EmpiricalStudiesinInformationVisualization:SevenScenarios,TVCG,December2011.

• ProvideadescriptiveviewofthevariousevaluationscenariosinInformationVisualization

• Surveyed850paperstocategorizethemintosevendistinctclasses(scenarios)

Scenarios

• Identifyyouruserevaluationgoals• Provideaninsightintothekindsofquestionsthatcanbeaskedforspecificscenarios

• Eachscenariocontainsadescriptionofthesituationinwhichyoucouldusespecifiedquestionsandmethodstoevaluateresults

Scenarios1. UnderstandingEnvironmentsandWorkPractices

(UWP)– MILCfallsintothiscategory– Spendtimeatthesiteoftheclienttryingto

understandthekindsoftasks,visualizationsandbarriersbeingused

2. EvaluatingVisualDataAnalysisandReasoning(VDAR)

– Studythedataanalysisorientedtasks(hypothesisgeneration,knowledgediscovery,decisionmaking)

Scenarios3. EvaluatingCommunicationthroughVisualization

(CTV)– Conveyinginformationtoalargegroupofpeople– Aimtoevaluateefficacy,usefulnessandinteraction

patterns4. EvaluatingCollaborativeDataAnalysis(CDA)

– Evaluatetheabilitytofacilitateseamlesscollaboration5. EvaluatingUserPerformance(UP)

– Identifylimitsofhumanperception,comparedvisualizationtechniquesbyexaminingperformance

Scenarios

6. EvaluatingUserExperience(UE)– Usabilitytestingintermsofusefulfeatures,

missingfeatures,improvements,learningcurve7. EvaluatingVisualizationAlgorithms(VA)

– Evaluatetheefficacyofalgorithmsinbeingabletohighlightpatterns,produceleastclutteredview,abilitytoscale(withlargedisplays,hugedatasets)

– Examinetheperformanceintermsofefficiencyofanalgorithm

Many‐to‐manymapping

• Scenariosdon’tmakedirectlytoasingleevaluationmethod

• Situationswhereyourevaluationgoalsmaybebroadmayincludemultiplescenarios– ExploringdataanalystsprocessofknowledgediscoverymayincludeUWP,VDAR,UE,VAandsoon

Blogcomments

• Peter– “ …importanceofeffectiveevaluation,andbycontrast,theproblemscreatedbyineffectiveevaluation.”

• Danny– “Thepaperdoesdoagoodjobindescribingthederivedcategoriesandexplainshowtheyaredifferent.”

• Eddie– “providesahandyframeworkwithinwhichtocategorizeevaluationscenariostoaidintheselectionof theproper evaluationmethod”

Userstudy:2DVectorVisualization

• D.H.Laidlaw,M.Kirby,C.Jackson,J.S.Davidson,T.Miller,M.DaSilva,W.Warren,andM.Tarr (2005).Comparing2Dvectorfieldvisualizationmethods:Auserstudy.TVCG,11(1):59‐70,2005.

• Conductedathoroughevaluationofexpertandnon‐expertusersforvisualizationtechniques

• Comparedsixvisualizationmethodsfor2Dvectorvisualization

• PresentationbyTimManess

User Evaluation

Documents

Transcript of User Evaluation