Inner Architecture of a Social Networking System Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner.
-
Upload
megan-bagby -
Category
Documents
-
view
215 -
download
1
Transcript of Inner Architecture of a Social Networking System Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner.
Inner Architecture of a Social Networking System
Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner
Who am I?
• Master student of FI MU• Member of LaSArIS– Webtops– Modern web applications– Cloud (and distributive) solutions
• First time speaker at conference
Social network systems
• Hundreds million users => advanced software architecture and technologies
• High performance• Scalability• Billions of rows
Table of contents
• What and why?– Takeplace
• Which way?– Hadoop– HBase– Memcached
• How?– Architecture and design
• Was it worth it?– Testing
Takeplace
Takeplace and Social Networking
• Web-based service facilitating organization of events based on meeting, sharing and communication.
• Emphasis on social and interpersonal interaction• Easy tool to comment conferences (feedback)• Professional user network: to create relations among
academic and professional world with common interests
• Analysis and statistics• „To behave like Facebook with relations like Twitter
and to be used as LinkedIn.“
Functional requirements
• Entities can create asymmetric relations• Posts• Walls and news feed• Comments and „like“
Technology requirements
• Linux and Cloud• Data-oriented application– High throughput– Heavy loads– Concurrent requests
• Caching tool
Relational databases
• Fixed schema, ACID, indexes, joins• Problems– scaling up dataset size– Read/write concurrency
• Typical use of MySQL: Production => Memcached (losing ACID) => Costly server => Denormalizing => „materialize“ most common queries => drop triggers, indexes
• (compromises or expensive)
• Inspired by Google BigTable• Regions• 4 dimensions• „multidimensional sorted persistent
distributed key-value map“• Keys & values = array of bytes• Row, CF, Columns & Version
Hbase
Example{
“aa” : {“cf” : {
“c1” : data“c2” : data
}“cf2” : {
“anyByteArray” : true}
},“ab” : { … }
}
Hadoop
• SW framework – backbone of distributed environment• MapReduce
• HDFS
HBase
• No real indexes• Automatic partitioning• Scale linearly and automatically• Parallel• Cheap• Not for everyone• Write once, read many• Built on top of Hadoop
Memcached
• Distributed cache• Typical usagepublic Data getData (String query) {
Data data = memcached.get(query);if (data == null) {
data = database.get(query);memcached.set(query, data);
}return data;
}
Architecture
Architecture (2)
• To be used in any system• Interface of services (REST, SOAP, …)• User tables• Services: Follow, Wall, Like and Discussion • Security
Architecture (3)
User ID transformation
Data!
• Three tables• Entities– Followers, Following, Blocked, Count, News
• Walls– Info, text, likes
• Discussions (similar to Walls)
Storing data
• Row IDs! Performance!• Lexically• Sequence scanner• UID (constant length)• yyyymmddhhmmssSSS• Inverted bytes -> newest to oldest
News feed
– One by one (slow)• OR– Store news at each profile (great redundancy)
• MEMCACHED!• Post put in DB => search followers => store
minimized in Memcached => links to news feed => 1 normal q & 1 batch q to Memcached
• TTL (LRU)
Conclusion
• Pros– High volume data distribution– Scalability– High throughput– Heavy data load (write once, read many)
• Cons– Losing relations, indexes, triggers, …– Responsibility for consistent data– still not sure how it will behave when deployed on production