WebCamp Ukraine 2016: Instant messenger with Python. Back-end development
-
Upload
viacheslav-kakovskyi -
Category
Software
-
view
559 -
download
0
Transcript of WebCamp Ukraine 2016: Instant messenger with Python. Back-end development
Instant messenger with PythonBack-end development
Viacheslav Kakovskyi WebCamp 2016
Me!
@kakovskyi
Python Developer at SoftServeContributor of Atlassian HipChat — Python 2, TwistedMaintainer of KPIdata — Python 3, asyncio
2
Agenda
● What is 'instant messenger'?● Related projects from my experience● Messaging protocols● Life of messaging platform● Lessons learned● Summary● Further reading
3
What is 'instant messenger'?
4
What is 'instant messenger'?
● online chat● real-time delivery● short messages
5
What is 'instant messenger'?
● history search● file sharing● mobile push notifications● video calling● bots and integrations
6
Related projects from my experience
● Hosted chat for teams and enterprises● Founded in 2009 by 3 students● 100 000+ connected users● 100+ nodes● REST API for integrations and bots● Built with Python 2 and Twisted
7
Messaging protocols Protocol is about:
● Message format● Allowed types of messages● Limitations● Routine
○ How to encode data?○ How to establish/close connection?○ How to authenticate?○ How to encrypt?
8
Messaging protocols ● OSCAR (1997)
● XMPP (1999)
● Skype (2003)
● WebSocket-based (2011)
● MQTT, MTProto, DHT-based, etc.
9
XMPP
● XMPP - signaling protocol● BOSH - transport protocol● Started from Jabber in 1999● XML as a message format● Stanza - basic unit in XMPP● Types of stanzas:
○ Message○ Presence○ Info/Query
10
XMPP
● Extensions defined by XEPs (XMPP Extension Protocols):○ Bidirectional-streams Over Synchronous
HTTP (BOSH)○ Serverless messaging○ File transfer and etc.
11
XMPP: Establishing a connection
12
Client:
<?xml version='1.0'?> <stream:stream to='example.com' xmlns='jabber:client' xmlns:stream='http://etherx.jabber.org/streams' version='1.0'>
Server:
<?xml version='1.0'?> <stream:stream from='example.com' id='someid' xmlns='jabber:client' xmlns:stream='http://etherx.jabber.org/streams' version='1.0'>
XMPP: Sending a message
13
Client:
<message from='[email protected]' to='[email protected]' xml:lang='en'> <body>Art thou not Romeo, and a Montague?</body></message>
Server:
<message from='[email protected]' to='[email protected]' xml:lang='en'> <body> Neither, fair saint, if either thee dislike. </body></message>
XMPP: Closing a connection
14
Client:
</stream:stream>
Server:
</stream:stream>
XMPP: Pros
● Robust and standardized● Extendable via XEPs● Secured● Native support of multi-sessions● A lot of clients implementations
15
XMPP: Cons
● Overhead○ Presence ○ Downloading the World on startup
● XML○ Large documents○ Expensive parsing
16
XMPP and Python
● Servers:○ TwistedWords - good place to start○ Tornado-based example○ aioxmpp○ XMPPFlask○ Punjab - BOSH-server on Twisted
17
XMPP and Python
● Clients:○ SleekXMPP - mature and solid○ Slixmpp - asyncio-support○ TwistedWords○ Wokkel - Twisted-based○ xmpp.py
● JS-client: Strophe.js
18
WebSocket-based solutions
● WebSocket - transport protocol● Standardized in 2011 by W3C● Full-duplex communication channel● JSON as a message format● Custom message types
19
WebSocket: Establishing a connection
20
Client:
GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Origin: http://example.com
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Version: 13
Server:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
WebSocket: Sending a message
21
Client:
{
"type": "message",
"ts": 1469563519,
"user": "kakovskyi",
"text": "Hello, @WebCamp!"
}
Server:
{
"type": "notification",
"ts": 1469563519,
"user": "WebCamp Bot",
"text": "Howdy @kakovskyi?"
}
WebSocket: Closing a connection
22
Client:
0x8
Server:
0x8
WebSocket: Pros
● Supported by majority of browsers● Low latency● Small bandwidth● Easy to start development
23
WebSocket: Cons
● Needs development of signaling protocol ● Timeouts/reconnections should be additionally
handled
24
WebSocket and Python
● Servers:○ Autobahn - Twisted and asyncio
implementations○ aiohttp○ Tornado○ Flask-SocketIO○ Flask-Sockets
25
WebSocket and Python
● Clients:○ Autobahn○ aiohttp○ Tornado-based example○ Vanilla websocket-client
● JS-client: SocketIO
26
Life of messaging platform● Authentication● Access control checks● Delivery
○ Messages○ User's presence○ Push notifications
● History retrieval● History search
27
Life of messaging platform● Parsing
○ Protocol○ Message content
● Dealing with file uploads○ Security checks○ Thumbnails distribution
● Multi-session support● Reconnection handling● Rate-limiting
28
Life of messaging platform● Server keeps connections open for every client● High amount of long-lived concurrent connections● Multithreaded approach isn't efficient due to overhead● Requires usage of a select implementation on backend:
○ poll○ epoll○ kqueue
● Usage of asynchronous Python frameworks is preferred for high loaded solutions
29
Life of messaging platform● Authentication
○ OAuth2○ Run encryption operations in a separate Python thread○ Cache users identities with Redis/Memcached
● Access-control checks○ Make the checks lightweight and cheap○ Raise an exception when operation isn't permitted
30
EAFP: Easier to ask for forgiveness than permission
Delivery● Make message delivery fault-tolerant● Limit size of a message● Filter content of messages:
○ Users like to send chars that break all the things● Reduce presence traffic, it could be a bottleneck for large chats● Use asynchronous broker for delivery when a user is offline
(email or push)○ Celery○ RQ○ Amazon Simple Queue Service○ Huey
31
Life of messaging platform● Push notifications
■ Vendors● Amazon SNS● APNS● Google Cloud Messaging● Firebase Cloud Messaging
■ Python tools● PyAPNs● Python-GCM● Pusher
● Be careful with device registration● Make delivery of pushes fault-tolerant
32
History retrieval● Return last messages for every chat instantly
○ Use double writes■ In-memory queue only for last messages■ Persistent storage for all the things
● Majority of history retrievals is for the last days○ Let's optimize the case
● Index messages by date
33
History search● ElasticSearch is the default solution for full-
text search● @a_soldatenko: What is the best full text
search engine for Python?● Add timing for search requests
34
Parsing● Protocol
○ Avoid to use Pure Python parsers■ ujson■ lxml
○ Run benchmarks against your typical cases● Message content
○ Be careful with regular expressions■ re2■ pyre2
○ Alternative parsers in Python
35
Dealing with file uploads● Security checks
○ File upload vulnerabilities○ Image upload
■ Decompression bomb■ Other vulnerabilities with Pillow
○ Amazon S3 as file storage■ boto■ aiobotocore■ botornado
● Thumbnails distribution○ Delegate that to S3○ Requested by a client even if not needed
36
Life of messaging platform● Multi-session support
○ Set expiration time○ Be ready to handle up to 4x sessions per user simultaneously
■ Desktop■ Mobile■ Tablet■ Laptop
● Reconnection handling○ Spin a proxy layer between messaging server and clients
● Rate-limiting○ Limit amount of operations per user/group for heavy stuff○ Leaky bucket○ Throttling
37
Lessons learned
● Bursty traffic○ Load testing is a must, but not always enough
■ Locust■ Yandex Tank
● Reconnect storm could be a big deal○ We should handle that on platform and client-side
● AWS issues make bad customers experience○ Put nodes in Multi-AZ
38
Lessons learned
● Incidents prevention is cheaper than resolution○ Grab stats and metrics about your services and
storages■ Redis for per-chat stats■ StatsD■ Grafana
○ Be notified when something starts going wrong■ Elastalert■ Monit■ DataDog
39
Lessons learned
● Don't stick with one language/stack○ Python is great, but for some cases Go, Ruby or
PHP are more suitable from product side○ Avoid business logic duplication in several repos,
spin a service and just call the endpoint● Releasing new features only for certain groups makes
product management easier○ LaunchDarkly
40
Lessons learned
● Don’t F**k the Customer○ Provide unit/integration tests with every PR○ Have development environment same as prod○ Have staging environment same as prod○ Make deployments fast○ Rollback faster○ Have a fallback plan
41
Summary
42
Summary● Select a messaging protocol which aligns with your needs● WebSocket + JSON could be the thing for new projects● Usage of asynchronous frameworks is preferred● Execute blocking operations in a separate thread● Collect metrics for common services operations● Caching saves a lot of time● Use C or Cython-based solutions for CPU-bound tasks● Have fast release/deploy/rollback cycle● Python is great, but don't hesitate to pick other tools
43
Further reading
● How HipChat Stores and Indexes Billions of Messages Using ElasticSearch● @kakovskyi: Maintaining a high load Python project for newcomers● HipChat: Important improvements to staging, presence & database storage● HipChat and the little connection that could● Elasticsearch at HipChat: 10x faster queries● Atlassian: How IT and SRE use ChatOps to run incident management● A Study of Internet Instant Messaging and Chat Protocols● What Is Async, How Does It Work, And When Should I Use It?● Leaky Bucket & Tocken Bucket - Traffic shaping● A guide to analyzing Python performance● Why Leading Companies Dark Launch - LaunchDarkly Blog● @bmwant: Asyncio-stack for web development
44
Questions?
45
Viacheslav [email protected]
@kakovskyi
Instant messenger with PythonBack-end development