Post on 22-Oct-2015
description
Adam Hitchcock@NorthIsUp
Scaling Realtime at DISQUS
Sunday, 17 March, 13
Sunday, 17 March, 13
Adam Hitchcock@NorthIsUp
Scaling Realtime at DISQUS
Sunday, 17 March, 13
we’re hiringdisqus.com/jobs
If this is interesting to you...
Sunday, 17 March, 13
what is DISQUS?
Sunday, 17 March, 13
Sunday, 17 March, 13
why do realtime?
๏ getting new data to the user asap๏ for increased engagement๏ and it looks awesome๏ and we can sell (or trade) it
Sunday, 17 March, 13
http://github.com/NorthIsUp/orbital2http://map.labs.disqus.com
Sunday, 17 March, 13
DISQUS sees a lot of traffic
Google Analytics: Feb 2013 - March 2012
Sunday, 17 March, 13
realertime
๏ currently active on all DISQUS sites
๏ tested ‘dark’ on our existing network๏ during testing:
๏ 1.5 million concurrently connected users๏ 45 thousand new connections per second๏ 165 thousand messages/second๏ <.2 seconds latency end to end
Sunday, 17 March, 13
so, how did we do it?
Sunday, 17 March, 13
Node.js and MongoDB!
Sunday, 17 March, 13
Node.js and MongoDB!
Sunday, 17 March, 13
This is PyCon.We used Python.
Sunday, 17 March, 13
and some otherTechnology You Know™
Sunday, 17 March, 13
thoonk redis queuesome python gluenginx push streamand long(er) polling
Sunday, 17 March, 13
architecture overview
Sunday, 17 March, 13
old-june
memcache
New Posts memcache
DISQUS embed clients
DISQUS
poll memcacheever 5 seconds
Sunday, 17 March, 13
june-july
redis pub/sub
New Posts redis pub/sub
DISQUS embed clients
DISQUS
HA Proxy
Flask FEcluster
Sunday, 17 March, 13
HA Proxy
july-october
Flask FEcluster
redis queue
“python glue”Gevent server
New Posts redis pub/sub
DISQUS embed clientsredis pub/sub
DISQUS
“python glue”Gevent server
Sunday, 17 March, 13
HA Proxy
august-october
Flask FEcluster
redis queue
“python glue”Gevent server
New Posts redis pub/sub
DISQUS embed clientsredis pub/sub
DISQUS
“python glue”Gevent server
2
14 BIG 6 servers
5 servers
Sunday, 17 March, 13
HA Proxy
august-october
Flask FEcluster
redis queue
“python glue”Gevent server
New Posts redis pub/sub
DISQUS embed clientsredis pub/sub
DISQUS
“python glue”Gevent server
2
6 servers
5 servers
2 for
14 BIG lots of servers,we can do better
Sunday, 17 March, 13
“python glue”Gevent server
october-now
nginx+
push streammodule
redis queue
New Posts ngnix pub endpoint
DISQUS embed clientshttp post
DISQUS
Sunday, 17 March, 13
“python glue”Gevent server
october-now
nginx+
push streammodule
redis queue
New Posts ngnix pub endpoint
DISQUS embed clientshttp post
DISQUS
2
5
Why still 5 for this?Network memory restriction, we
can’t fix this without kernel hacking, tweaking, etc.
(if you know how, tell us, then apply for a job, then fix it for us)
Sunday, 17 March, 13
october-now
django
Formatter
Publishers
thoonk queue
http post
ngnix pub endpoint
DISQUS embed clientsother realtime
stuff
nginx+
push streammodule
New Posts
Sunday, 17 March, 13
thoonk redis queuesome python gluenginx push streamand long(er) polling
Sunday, 17 March, 13
the thoonk queue
๏ django post_save and post_delete hooks๏ thoonk is a queue on top of redis๏ implemented as a DFA๏ provides job semantics
๏ useful for end to end acking๏ reliable job processing in distributed system
๏ did I mention it’s on top of redis?๏ uses zset to store items == ranged queries
Sunday, 17 March, 13
thoonk redis queuesome python gluenginx push streamand long(er) polling
Sunday, 17 March, 13
the python glue
๏ listens to a thoonk queue๏ cleans & formats message
๏ this is the final format for end clients
๏ compress data now๏ publish message to nginx and
other firehoses๏ forum:id, thread:id, user:id,
post:id
Formatter
Publishers
Sunday, 17 March, 13
gevent is nice
# the code is too big to show here, so just import it# http://bitly.com/geventspawn
from realertime.lib.spawn import Watchdogfrom realertime.lib.spawn import TimeSensitiveBackoff
Sunday, 17 March, 13
data pipelines
class Pipeline(object): def parse_data(self, data): raise NotImplemented('No ParserMixin used')
def compute_data(self, data, parsed_data): raise NotImplemented('No ComputeMixin used')
def publish_data(self, data, parsed_data, computed_data): raise NotImplemented('No PublisherMixin used')
def handle(self, data): parsed_data = self.parse_data(data) computed_data = self.compute_data(data, parsed_data) return self.publish_data(data, parsed_data, computed_data)
Sunday, 17 March, 13
Example Mixinsclass JSONParserMixin(Pipeline): def parse_data(self, data): return json.loads(data)
class AnnomizeDataMixin(Pipeline): def parse_data(self, data, parsed_data): return {}
class SuperSecureEncryptDataMixin(Pipeline): def parse_data(self, data, parsed_data): return parsed_data.encode('rot13')
class HTTPPublisher(Pipeline): def publish(self, data, parsed_data, computed_data): u = urllib2.urlopen(self.dat_url, computed_data) return u
class FilePublisher(Pipeline): def publish(self, data, parsed_data, computed_data): with open(self.output, 'a') as f: f.write(computed_data)
Sunday, 17 March, 13
Finished Pipeline
class JSONAnnonHTTPPipeline( JSONParserMixin, AnnomizeDataMixin, HTTPPublisherMixin): pass
class JSONSecureHTTPPipeline( JSONParserMixin, SuperSecureEncyptionMixin, HTTPPublisherMixin): pass
class JSONAnnonFilePipeline( JSONParserMixin, AnnomizeDataMixin, FilePublisherMixin): pass
Sunday, 17 March, 13
real live DISQUS codeclass FEOrbitalNginxMultiplexer(
SchemaTransformerMixin, JSONFormatterMixin, SelfChannelsMixin, HTTPPublisherMixin):
def __init__(self, domains, api_version=1): schema_namespace = 'orbital' self.channels = ('orbital', )
super(FEOrbitalNginxMultiplexer, self).__init__(domains=domains, api_version=api_version, schema_namespace=schema_namespace)
class FEPublicAckingMultiplexer( PublicTransformerMixin, JSONFormatterMixin, FEChannelsMixin, ThoonkQueuePubSubPublisherMixin):
def __init__(self, domains, api_version): schema_namespace = 'general' super(FEPublicAckingMultiplexer, self).__init__(domains=domains, api_version=api_version, schema_namespace=schema_namespace)
Sunday, 17 March, 13
thoonk redis queuesome python gluenginx push streamand long(er) polling
Sunday, 17 March, 13
nginx push stream
๏ follow John Watson (@wizputer) for updated #humblebrags as we ramp up traffic
๏ an example config can be found here:http://bit.ly/disqus-nginx-push-stream
http://wiki.nginx.org/HttpPushStreamModuleSunday, 17 March, 13
nginx push stream
๏ Replaced webservers and Redis Pub/Sub๏ But starting with Pub/Sub was important for
us๏ Encouraged us to over publish on keys
Sunday, 17 March, 13
nginx push stream
๏ Turned on for 70% of our network...๏ ~950K subscribers (peak single machine)๏ peak 40 MBytes/second (per machine)๏ CPU usage is still well under 15%
๏ 99.845% active writes (the socket is written to often enough to come up as ACTIVE)
http://wiki.nginx.org/HttpPushStreamModuleSunday, 17 March, 13
config push stream
location = /pub { allow 127.0.0.1; deny all;
push_stream_publisher admin; set $push_stream_channel_id $arg_channel;}
location ^~ /sub/ { # to maintain api compatibility we need this location ~ /sub/(.*)/(.*)$ { # Url encoding things? $1%3A2$2 set $push_stream_channels_path $1:$2;
push_stream_subscriber streaming; push_stream_content_type application/json; }}
http://wiki.nginx.org/HttpPushStreamModuleSunday, 17 March, 13
examples
# Subscurl -s 'localhost/sub/forum/cnn'curl -s 'localhost/sub/thread/907824578'curl -s 'localhost/sub/user/northisup'
# Pubscurl -s -X POST 'localhost/pub?channel=forum:cnn' \ -d '{"some sort": "of json data"}'
curl -s -X POST 'localhost/pub?channel=thread:907824578' \ -d '{"more": "json data"}'
curl -s -X POST 'localhost/pub?channel=user:northisup' \ -d '{"the idea": "I think you get it by now"}'
http://wiki.nginx.org/HttpPushStreamModuleSunday, 17 March, 13
measure nginx
location = /push-stream-status { allow 127.0.0.1; deny all;
push_stream_channels_statistics; set $push_stream_channel_id $arg_channel;}
http://wiki.nginx.org/HttpPushStreamModuleSunday, 17 March, 13
thoonk redis queuesome python gluenginx push streamand long(er) polling
Sunday, 17 March, 13
long(er) polling
onProgress: function () { var self = this; var resp = self.xhr.responseText; var advance = 0; var rows;
// If server didn't push anything new, do nothing. if (!resp || self.len === resp.length) return;
// Server returns JSON objects, one per line. rows = resp.slice(self.len).split('\n');
_.each(rows, function (obj) { advance += (obj.length + 1); obj = JSON.parse(obj); self.trigger('progress', obj); }); self.len += advance;}
Sunday, 17 March, 13
Soon... EventSource
// Currently EventSource has CORS issuesev = EventSource(dat_url);ev.addEventListener("Post", handlePostEvent);
Sunday, 17 March, 13
test, measure, repeat
Sunday, 17 March, 13
test
๏ Darktime๏ use existing network to load test๏ (user complaints when it didn’t work...)
๏ Darkesttime๏ load testing a single thread
๏ have knobs you can twiddle
Sunday, 17 March, 13
measure
๏ measure all the things!๏ especially when the numbers don’t line up๏ measuring is hard in distributed systems๏ try to express things as +1 and -1 if you
can๏ Sentry for measuring exceptions
Sunday, 17 March, 13
pretty graphs
Sunday, 17 March, 13
how does it really scale?
POPE
white smokefrancis announced
Sunday, 17 March, 13
maths
Sunday, 17 March, 13
it’s been a busy few weeks
Sunday, 17 March, 13
wha?
๏ People do weird stuff with your stuff๏ turned off this server in Oct 2012๏ Still getting 100 req/sec
Sunday, 17 March, 13
lessons
๏ do hard (computation) work early๏ end-to-end acks are good, but expensive๏ redis/nginx pubsub is effectively free
Sunday, 17 March, 13
If this was interesting to you...
psst, we’re hiringdisqus.com/jobs
Sunday, 17 March, 13
special thanks
๏ the team at DISQUS๏ like jeff a.k.a. @nfluxx who had to review all
my code๏ and especially our dev-ops guys๏ like john watson a.k.a. @wizputer who
found the nginx-push-stream module
psst, we’re hiringdisqus.com/jobs
Sunday, 17 March, 13
slide full o’ links
๏ Nginx push stream modulehttp://wiki.nginx.org/HttpPushStreamModule
๏ Thoonk (redis queue)http://github.com/andyet/thoonk.py
๏ Sentry (distributed traceback aggregation)http://github.com/dcramer/sentry
๏ Gevent (python coroutines and greenlets)http://gevent.org/
๏ Scales (in-app metrics)http://github.com/Greplin/scales
code.disqus.com
Sunday, 17 March, 13
Come find me here!PyCon 2013
Santa Clara Convention CenterHall A-B
Santa Clara, CA
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
NOTE: - ALL BOOTHS ARE 10’x10’ UNLESS NOTED - (50) 10’x15’ BOOTHS - (64) 10’x10’ BOOTHS - (2) 10’x20’ BOOTH - (1) 8’x20’ BOOTH - ALL AISLES ARE 10’ UNLESS NOTED
20’ 20’
8’ 8’
LUNCH&
BREAKS
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
8’
20’ 20’
10’20’
19’
Revised 1/9/2013
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’10’x15’
10’x20’
10’x15’
10’x15’
10’x20’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
8’x20’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
10’x15’
Sunday, 17 March, 13
we are still hiring
psst, we’re hiringdisqus.com/jobs
Sunday, 17 March, 13
Questions I have
๏ What is the best kernel config for webscale concurrency. Nginx?
๏ I <3 gevent, but what if I want to pypy?๏ Nginx + lua? Seems kind of awesome.๏ Composing data pipelines: good or bad?๏ I didn’t have time to mention:
๏ Kafka, what is it good for?๏ Seriously, why not RabbitMQ?
Sunday, 17 March, 13
Adam Hitchcock@NorthIsUp
DISQUSsion?
Sunday, 17 March, 13