Data visualisations quality aspects
-
Upload
european-environment-agency -
Category
Data & Analytics
-
view
39 -
download
0
Transcript of Data visualisations quality aspects
Data visualisationDaviz, quality and interoperability
About me
● Web technology manager (EEA)● M.Sc. in Computer Science
(Lund University, SWE)● Surveyor (ITA)● 15 years in IT and web development
(programming and project management)● Junior Researcher: Machine vision for
surveillance cameras at Axis● E-commerce websites for telecom industry● Product Owner of DaViz and many powerful Plone Add-ons● Technical manager for the EEA main portal and CMS● Data Visualisation, Data Science, Open Data, Statistics, Semantic Web,
Linked Data, Usability and User Experience, Artificial Intelligence, Agile/Lean management…
DaViz, what and why
desktop based
web-based
Remove any visual clutter
before
after
Unsorted (Don’t) Sorted (Do)
Remove legend when not needed
There is no need to have a legend when there is only one data category shown. What is measured can be added to the title or axis.
Avoid pie charts and donuts
The human mind thinks linearly: we can easily compare lengths/heights of line segments but when it comes to angles and areas most of us can't judge them well.
Do you see what works best?
Avoid stacked barchart
Don’t Do
Correlation does not imply causation
● see also "
Superimposing time series is the biggest source of silly theories"
Per capita consumption of cheese correlates with number of people who died by becoming tangled in their bedsheets
Use map only when needed
The map on the right is just trying to show too much information at once.
Moreover data would be much easier to compare with a basic bar chart (below).
Difficult to compare bar charts placed on map, since they are not aligned. A bar chart would make it much easier and precise to compare countries.
Countries with relative small area are hidden, countries with large areas are made more prominent (intentional?). Is country’s area really relevant here? Is the geo-distribution important? How to compare properly?
Colors
● Different colors should be used for
different categories (e.g., male/female,
types of fruit), not different values in a
range (e.g., age, temperature).
● Do not use rainbows for range values
● If you want color to show a numerical
value, use a range that goes from
white to a highly saturated color in one
of the universal color categories. no
rainbows
Don’t Do
Don’t forget 7%-10% of
your male audience
(color deficiency)
what color-deficient people seeoriginal chart
Use Vischeck to test your images. If the chart is
readable in black and white than it is even better!
Choose your chart type wisely
Online tools like the Data Visualization Catalogue or a decision diagram [2006, A.Abela] helps you finding the right chart for your data.
Data provenance, trust, legitimacy
● Adding data source information helps giving credibility and trust in your chart
● When adding source info on your chart, distinguish datasource info from figure source info
● Disclose who financed the data visualisation work and data collection
● Disclose your data and methodology -> reproducible and verifiable
from: “Legitimacy, transparency,reproducibility”, Andrea Saltelli, JRC, Head of the Econometrics and Applied Statistics Unit
Show the level of confidence, build trust
Ask these questions before publishing your chart, and be prepared for the critiques:
1. What was the source of your data?2. How well do the sample data represent the population?3. Does your data distribution include outliers? How did they
affect the results?4. What assumptions are behind your analysis? Might certain
conditions render your assumptions and your model invalid?5. Why did you decide on that particular analytical approach?
What alternatives did you consider?6. How likely is it that the independent variables are actually
causing the changes in the dependent variable? Might other analyses establish causality more clearly?
Typical statistical error - EU trends
See online example
Typical statistical error - EU trends
See online example
It is not statistically correct to make a trend analysis of data across timewhen the data in question (or sample) is not representative for the whole.E.g. EU12 is not representative for EU25 or EU28, therefore the data cannotbe used to state a trend for the entire EU as it is in 2014, EU has changed!
very important info!
Typical statistical error - including no data
See online example
We cannot say “20.9% of our colleagues are male”. But we can say “20.9% of the sample we met are male”, but this is not saying much about the entire population (the entire staff).
Typical statistical error - including no data
See online example
If we have used a proper sampling technique, e.g. randomly selecting the staff, we have a sample of (580 people) that is representative for the whole (1000 people) with a 95% confidence level and a margin-error of 2.64%. We can now say that 39.7% +- 2.64% are male at our work, with a confidence level of 95%, and that is a big difference to what we said in previous slide (20.9%) !https://www.checkmarket.com/market-research-resources/sample-size-calculator/
Show the level of confidence
Tell your audience how confident you are in your assertions by. Include error bars any time you use data to make an argument
source: The importance of uncertainty, Berkeley Science review. http://sciencereview.berkeley.edu/importance-uncertainty/
Get it professionally reviewed
Have a statistician review your analysis and your representation. You will be surprised about how much corrections and improvements you can achieve.
Welcome to the data science!
source: http://sciencereview.berkeley.edu/article/first-rule-data-science/
I shall not use visualization to intentionally hide or confuse the truth which it is intended to portray. I will respect the great power visualization has in garnering wisdom and misleading the uninformed. I accept this responsibility willfully and without reservation, and promise to defend this oath against all enemies, both domestic and foreign.
hippocratic oath for data scientists
VisWeek2011, Jason Moore, A code for ethics for data visualisations professionals
THANK YOU!
More resources: http://www.eea.europa.eu/data-and-maps/daviz/learn-more/