Scaling real-time visualisations for Elections 2014
description
Transcript of Scaling real-time visualisations for Elections 2014
S Anand, Chief Data Scientist, Gramener
Scaling Real-time
Visualisations for Elections
2014
@sanand0
https://gramener.com/election/story.ddp
What’s the largest number of people that stood in an
election?
“We’ll cross 5 million visitors tomorrow”
Nielsen’s server
ETL
Candidate Votes
Visualisationtemplate
1 2 3 4
Azure Ubuntu serverSingapore
GramenerVisualisation server
Real time
nginx
1 2 3 4
SQL Server
CNN Windows serverNoida, India
ETL
rsync
Candidate Votes
CNN WinXP laptopNoida, India
Every 10seconds
Every 10s
Let’s optimize backwards
WHY NGINX?
http://wiki.dreamhost.com/Web_Server_Performance_Comparison
Split load
Cache it
Serve static filesdirectly
Compress content
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●1,518 KB 379 KBgzipped to
wiki.nginx.org
h5bp.github.io
Only 1 image
… but a 3MB SVG
Kraken.io
Inkscape
2 decimal places 3 decimal places 4 decimal places
●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
95KB 145KB 613KB
SVG Compression
Nielsen’s server
ETL
Candidate Votes
Visualisationtemplate
1 2 3 4
Azure Ubuntu serverSingapore
GramenerVisualisation server
Real time
nginx
1 2 3 4
SQL Server
CNN Windows serverNoida, India
ETL
rsync
Candidate Votes
CNN WinXP laptopNoida, India
Every 10seconds
Every 10s
Now, optimize the rendering
We need these filters to work instantly
We cannot afford a server request for every filter change
We need client-side content generation, driven by data
HTMLXMLProlog
JavascriptPythonJava
How content is written
Declarative
Procedural
How data is used to write it
Map attributes to functions
TemplatesBinding
Create HTML strings
Declarative
ProceduralTemplates
BindingUnderscore
knockout
jQuery
d3
Let’s make a bar chart
with each of these
Examples of representative libraries
https://github.com/sanand0/fifthel-2014
underscore: declare a template
jQuery: procedurally create the HTML
knockout: declaratively bind data to HTML
d3: procedurally bind data to elements and attributes
Nielsen’s server
ETL
Candidate Votes
Visualisationtemplate
1 2 3 4
Azure Ubuntu serverSingapore
GramenerVisualisation server
Real time
nginx
1 2 3 4
SQL Server
CNN Windows serverNoida, India
ETL
rsync
Candidate Votes
CNN WinXP laptopNoida, India
Every 10seconds
Every 10s
Finally, optimize data
1.5 MB of data every second
but some of it is staticsome is redundant
and some misspelt or wrong
Correct mis-spellings
Load just what you need (query time reduced by 70%)
Normalise static data
Refresh only the changed dataWhen gzipped, JSON is no larger than CSV
JSON is natively parsed and more flexibleJSON?Redundanc
y
27KB
“We’ll cross 5 million visitors tomorrow”
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230
200000
400000
600000
800000
1000000
1200000
1400000
Half a million just in the first hour
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230
200000
400000
600000
800000
1000000
1200000
1400000 Over 1.3 million in the next!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230
200000
400000
600000
800000
1000000
1200000
1400000 10 million visits election day
Does age make a difference?Do old candidates win less
often?
25-30 30-35 35-40 40-45 45-50 50-55 55-60 60-65 65-70 70-75 75-80 80-85 85-900%
10%
20%
30%
40%
1% 2%4%
6%
9%
11%
14%
11%
16%18%
22% 22%
33%
0
500
1000
1500
2000
2500
Win %The number of winning candidates as a % of candidates in the age group
CandidatesThe number of candidates in
each age group
Lok
Sabh
a (2
004
onw
ards
)
Name length