Post on 19-Jan-2016
description
An Evaluation of Microarray Visualization Tools An Evaluation of Microarray Visualization Tools for Biological Insightfor Biological Insight
Presented byPresented byTugrul Ince and Nir PeerTugrul Ince and Nir PeerUniversity of MarylandUniversity of Maryland
Purvi SaraiyaChris North
Dept. of Computer ScienceVirginia Polytechnic Institute
and State University
Karen Duca
Virginia Bioinformatics Institute
Virginia Polytechnic Institute and State University
2
GoalsGoals
Evaluate five popular visualization Evaluate five popular visualization toolstools Cluster/TreeviewCluster/Treeview TimeSearcherTimeSearcher Hierarchical Clustering Explorer (HCE)Hierarchical Clustering Explorer (HCE) SpotfireSpotfire GeneSpringGeneSpring
Do so in the context of bioinformatics Do so in the context of bioinformatics data explorationdata exploration
3
GoalsGoals
Research QuestionsResearch Questions How successful are these tools in stimulating How successful are these tools in stimulating
insight?insight? How do various visualization techniques How do various visualization techniques
affect the users’ perception of data?affect the users’ perception of data? How does users’ background affect the tool How does users’ background affect the tool
usage?usage? How do these tools support hypothesis How do these tools support hypothesis
generation?generation? Can insight be measured in a controlled Can insight be measured in a controlled
experiment?experiment?
4
Visualization EvaluationsVisualization Evaluations
Typically evaluations consist ofTypically evaluations consist of controlled measurements of user controlled measurements of user
performance and accuracy on performance and accuracy on predetermined taskspredetermined tasks
We are looking for an evaluation that We are looking for an evaluation that better simulates a bioinformatics data better simulates a bioinformatics data analysis scenarioanalysis scenario We use a protocol the focuses onWe use a protocol the focuses on
recognition and quantification of insights gained recognition and quantification of insights gained from actual exploratory use of visualizationsfrom actual exploratory use of visualizations
5
InsightsInsights Hard to define what is an “insight”Hard to define what is an “insight” We need this term to be quantifiable and We need this term to be quantifiable and
reproduciblereproducible SolutionSolution
Encourage users to think aloudEncourage users to think aloud and report any findings they have about the datasetand report any findings they have about the dataset
Videotape a session to capture and Videotape a session to capture and characterize individual insights as they occurcharacterize individual insights as they occur
generally provides more information than generally provides more information than subjective measures from post-experiment surveyssubjective measures from post-experiment surveys
6
InsightsInsights
Define insight asDefine insight as an individual observation about the data an individual observation about the data
by the participantby the participant a unit of discoverya unit of discovery Essentially, any data observation made Essentially, any data observation made
during the think aloud protocolduring the think aloud protocol Now we can quantify some Now we can quantify some
characteristics of each insightcharacteristics of each insight
7
Insight CharacteristicsInsight Characteristics ObservationObservation
The actual finding about the dataThe actual finding about the data TimeTime
The amount of time taken to reach the insightThe amount of time taken to reach the insight Domain ValueDomain Value
The significance of the insight. Coded by a domain expert.The significance of the insight. Coded by a domain expert. HypothesesHypotheses
Hypothesis and direction of researchHypothesis and direction of research Directed vs. UnexpectedDirected vs. Unexpected
Recall: participants are asked to identify questions they Recall: participants are asked to identify questions they want to explorewant to explore
CorrectnessCorrectness Breadth vs. DepthBreadth vs. Depth
8
Insight CharacteristicsInsight Characteristics
CategoryCategory Overview – overall distributions of gene Overview – overall distributions of gene
expressionexpression Patterns – identification or comparison Patterns – identification or comparison
across data attributesacross data attributes Groups – identification or comparison of Groups – identification or comparison of
groups of genesgroups of genes Details – focused information about Details – focused information about
specific genesspecific genes
9
Experiment DesignExperiment Design
A 3A 35 between-subjects design5 between-subjects design between-subjects between-subjects different subjects different subjects
for each pairfor each pair Dataset: 3 treatmentsDataset: 3 treatments Visualization tool: 5 treatmentsVisualization tool: 5 treatments
10
Experiment DesignExperiment Design ParticipantsParticipants
2 participants per dataset per tool2 participants per dataset per tool Have at least a Bachelor’s degree in a biological Have at least a Bachelor’s degree in a biological
fieldfield Assigned to tools they had never worked with Assigned to tools they had never worked with
beforebefore to prevent advantageto prevent advantage measure learning timemeasure learning time
CategoriesCategories 10 Domain Experts10 Domain Experts
Senior researchers with extensive experience in microarray Senior researchers with extensive experience in microarray experiments and microarray data analysisexperiments and microarray data analysis
11 Domain Novices11 Domain Novices Lab technicians or graduate student research assistantsLab technicians or graduate student research assistants
9 Software Developers9 Software Developers Professionals who implement microarray software toolsProfessionals who implement microarray software tools
11
Protocol and MeasuresProtocol and Measures Chose new users with only minimal tool trainingChose new users with only minimal tool training
Success in the initial usage period is critical for the Success in the initial usage period is critical for the tool’s adoption by biologiststool’s adoption by biologists
Participants received an initial trainingParticipants received an initial training Background description about the datasetBackground description about the dataset 15-minute tool tutorial15-minute tool tutorial
Participants listed some analysis questionsParticipants listed some analysis questions Instructed to examine the data with the tool as Instructed to examine the data with the tool as
long as neededlong as needed They were allowed to ask for help about the toolThey were allowed to ask for help about the tool
Simulates training by colleaguesSimulates training by colleagues
12
Protocol and MeasuresProtocol and Measures
Every 15 minutes, participants Every 15 minutes, participants estimated percent of total potential estimated percent of total potential insight they obtained so farinsight they obtained so far
Finally, assessed overall experience Finally, assessed overall experience with the tools during sessionwith the tools during session
Entire session was videotaped for Entire session was videotaped for later analysislater analysis Later, all individual occurrences of Later, all individual occurrences of
insights were identified and codifiedinsights were identified and codified
13
Show me picturesShow me pictures
Here are the tools!!!Here are the tools!!!
14
Cluster/TreeView = ClusterView
ClusterCluster to cluster datato cluster data
TreeViewTreeView Visualize the Visualize the
clustersclusters Uses heat-mapsUses heat-maps
15
TimeSearcher 1TimeSearcher 1
Parallel Coordinate Parallel Coordinate VisualizationVisualization
Interactive Interactive FilteringFiltering
Line Graphs for Line Graphs for each data entityeach data entity
16
HCEHCE
Clusters dataClusters data Several Several
VisualizationsVisualizations Heat-MapsHeat-Maps Parallel Parallel
CoordinatesCoordinates Scatter PlotsScatter Plots HistogramsHistograms
Brushing and Brushing and LinkingLinking
17
SpotfireSpotfire General Purpose General Purpose
Visualization ToolVisualization Tool Several DisplaysSeveral Displays
Scatter PlotsScatter Plots Bar GraphsBar Graphs HistogramsHistograms Pie/Line ChartsPie/Line Charts Others…Others…
Dynamic Query Dynamic Query SlidersSliders
Brushing and Brushing and LinkingLinking
18
GeneSpringGeneSpring
Suitable for Suitable for Microarray data Microarray data analysisanalysis Shows physical Shows physical
positions on genomespositions on genomes Array layoutsArray layouts PathwaysPathways Gene-to-gene Gene-to-gene
comparisoncomparison Brushing and LinkingBrushing and Linking Clustering capabilityClustering capability
19
Enough about Tools,Enough about Tools,
Tell me the Results!!!Tell me the Results!!!
20
ClusterView TimeSearcher 1 HCE Spotfire GeneSpring
Number of InsightsNumber of Insights
Spotfire: Highest number of insightsSpotfire: Highest number of insights HCE: poorestHCE: poorest
21
Total Domain ValueTotal Domain Value
Spotfire: Highest insight valueSpotfire: Highest insight value HCE, GeneSpring: poorerHCE, GeneSpring: poorer
ClusterView TimeSearcher 1 HCE Spotfire GeneSpring
22
Avg. Final Amount Avg. Final Amount LearnedLearned
Spotfire: high value in learningSpotfire: high value in learning ClusterView and HCE are poorClusterView and HCE are poor
ClusterView TimeSearcher 1 HCE Spotfire GeneSpring
23
Avg. Time to First Avg. Time to First InsightInsight
ClusterView: very short time to first insightClusterView: very short time to first insight TimeSearcher 1 and Spotfire are also quickTimeSearcher 1 and Spotfire are also quick
ClusterView TimeSearcher 1 HCE Spotfire GeneSpring
24
Avg. Total TimeAvg. Total Time
Total time users spent using the toolTotal time users spent using the tool Low Values: Efficient or Not useful Low Values: Efficient or Not useful
for insightfor insight
ClusterView TimeSearcher 1 HCE Spotfire GeneSpring
25
Unexpected InsightsUnexpected Insights HCE revealed several unexpected resultsHCE revealed several unexpected results ClusterView provided a fewClusterView provided a few TimeSearcher 1 for time series dataTimeSearcher 1 for time series data Spotfire contributed to 2 unexpected Spotfire contributed to 2 unexpected
insightsinsights
HypothesesHypotheses A few insights led to hypothesesA few insights led to hypotheses
Spotfire Spotfire 3 3 ClusterView ClusterView 2 2 TimeSearcher 1 TimeSearcher 1 1 1 HCE HCE 1 1
26
Tools vs. DatasetsTools vs. Datasets
27
Insight CategoriesInsight Categories Overall Gene ExpressionOverall Gene Expression
Overview of genes in generalOverview of genes in general Expression PatternsExpression Patterns
Searching patterns is criticalSearching patterns is critical Clustering is usefulClustering is useful
GroupingGrouping Some users wanted to group genesSome users wanted to group genes GeneSpring enables groupingGeneSpring enables grouping
Detail InformationDetail Information Users want detailed information about genes Users want detailed information about genes
that are familiar to themthat are familiar to them
28
Visual Representations and Visual Representations and InteractionsInteractions
Although some tools have many Although some tools have many visualization techniques, users tend to visualization techniques, users tend to use only a fewuse only a few Spotfire users preferred heat-mapsSpotfire users preferred heat-maps GeneSpring users preferred parallel GeneSpring users preferred parallel
coordinatescoordinates Lupus dataset: visualized best with heat-Lupus dataset: visualized best with heat-
mapsmaps Most users preferred outputs of Most users preferred outputs of
clustering algorithmsclustering algorithms HCE not useful when a particular HCE not useful when a particular
column arrangement is usefulcolumn arrangement is useful
29
Running out of time, So, Running out of time, So, wrap upwrap up
Use a Visualization tool (that’s why we’re Use a Visualization tool (that’s why we’re here!)here!)
Spotfire: best general performanceSpotfire: best general performance GeneSpring: Hard to useGeneSpring: Hard to use Dataset dictates best tool!Dataset dictates best tool!
Time Series data: TimeSearcherTime Series data: TimeSearcher Others: Spotfire, GeneSpring?Others: Spotfire, GeneSpring?
Interaction is the keyInteraction is the key Grouping and Clustering are necessary Grouping and Clustering are necessary
featuresfeatures
30
CritiqueCritique
In all fairness, measuring insights is really In all fairness, measuring insights is really hard! Here are some possible issueshard! Here are some possible issues
SubjectivitySubjectivity Experiment relies on users always thinking aloudExperiment relies on users always thinking aloud Also, depends on a domain expert to evaluate Also, depends on a domain expert to evaluate
insightsinsights Results may vary widely based on participants Results may vary widely based on participants
expertise (only two per tool-dataset pair)expertise (only two per tool-dataset pair) Some insight characteristics are inherently Some insight characteristics are inherently
subjectivesubjective Domain ValueDomain Value Breadth vs. DepthBreadth vs. Depth
31
CritiqueCritique
How do one count insights?How do one count insights? Assumes honest reporting by participantsAssumes honest reporting by participants Some insights may be of no great valueSome insights may be of no great value What if a discovery just reaffirms a known What if a discovery just reaffirms a known
fact? Is that an insight?fact? Is that an insight? Measuring time taken to reach an Measuring time taken to reach an
insightinsight Maybe instead of measuring from beginning Maybe instead of measuring from beginning
of session we should measure from last of session we should measure from last insightinsight