Inner Architecture of a Social Networking System Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner.

Inner Architecture of a Social Networking System

Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner

Who am I?

• Master student of FI MU• Member of LaSArIS– Webtops– Modern web applications– Cloud (and distributive) solutions

• First time speaker at conference

Social network systems

• Hundreds million users => advanced software architecture and technologies

• High performance• Scalability• Billions of rows

Table of contents

• What and why?– Takeplace

• Which way?– Hadoop– HBase– Memcached

• How?– Architecture and design

• Was it worth it?– Testing

Takeplace

Takeplace and Social Networking

• Web-based service facilitating organization of events based on meeting, sharing and communication.

• Emphasis on social and interpersonal interaction• Easy tool to comment conferences (feedback)• Professional user network: to create relations among

academic and professional world with common interests

• Analysis and statistics• „To behave like Facebook with relations like Twitter

and to be used as LinkedIn.“

Functional requirements

• Entities can create asymmetric relations• Posts• Walls and news feed• Comments and „like“

Technology requirements

• Linux and Cloud• Data-oriented application– High throughput– Heavy loads– Concurrent requests

• Caching tool

Relational databases

• Fixed schema, ACID, indexes, joins• Problems– scaling up dataset size– Read/write concurrency

• Typical use of MySQL: Production => Memcached (losing ACID) => Costly server => Denormalizing => „materialize“ most common queries => drop triggers, indexes

• (compromises or expensive)

• Inspired by Google BigTable• Regions• 4 dimensions• „multidimensional sorted persistent

distributed key-value map“• Keys & values = array of bytes• Row, CF, Columns & Version

Hbase

Example{

“aa” : {“cf” : {

“c1” : data“c2” : data

}“cf2” : {

“anyByteArray” : true}

},“ab” : { … }

}

Hadoop

• SW framework – backbone of distributed environment• MapReduce

• HDFS

HBase

• No real indexes• Automatic partitioning• Scale linearly and automatically• Parallel• Cheap• Not for everyone• Write once, read many• Built on top of Hadoop

Memcached

• Distributed cache• Typical usagepublic Data getData (String query) {

Data data = memcached.get(query);if (data == null) {

data = database.get(query);memcached.set(query, data);

}return data;

}

Architecture

Architecture (2)

• To be used in any system• Interface of services (REST, SOAP, …)• User tables• Services: Follow, Wall, Like and Discussion • Security

Architecture (3)

User ID transformation

Data!

• Three tables• Entities– Followers, Following, Blocked, Count, News

• Walls– Info, text, likes

• Discussions (similar to Walls)

Storing data

• Row IDs! Performance!• Lexically• Sequence scanner• UID (constant length)• yyyymmddhhmmssSSS• Inverted bytes -> newest to oldest

News feed

– One by one (slow)• OR– Store news at each profile (great redundancy)

• MEMCACHED!• Post put in DB => search followers => store

minimized in Memcached => links to news feed => 1 normal q & 1 batch q to Memcached

• TTL (LRU)

Conclusion

• Pros– High volume data distribution– Scalability– High throughput– Heavy data load (write once, read many)

• Cons– Losing relations, indexes, triggers, …– Responsibility for consistent data– still not sure how it will behave when deployed on production

Inner Architecture of a Social Networking System Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner.

Documents

Transcript of Inner Architecture of a Social Networking System Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner.