mixi jp scaling out with open source

Post on 12-Nov-2014

730 views 3 download

Tags:

Transcript of mixi jp scaling out with open source

mixi.jpscaling out with open source

Batara Kesuma mixi, Inc.bkesuma@mixi.co.jp

Introduction

•Batara Kesuma•CTO of mixi, Inc.

What is mixi?

•Social networking service• Diary, community, message, review, photo

album, etc.

• Invitation only

•Largest and fastest growing SNS in Japan

Latest information- Friends new diary- Comments history- Communities topics- Friends new reviews- Friends new albums

My latest diaries and reviews

User Testimonials

Friends

Community listing

History of mixi

•Development started in December 2003• Only 1 engineer (me)

• 4 months of coding

•Opened on February 2004

Two months later

•10,000 users•600,000 PV/day

The “Oh crap!” factor

•This model works•But how do we scale out?

The first year

•The online population of mixi grew significantly

•600 users to 210,000 users

The second year

•210,000 users to 2 million users

And now?

More than 3.7 million users15,000 new users/day

Population of Japan is: 127 millionInternet users: 86.7 million

Source CIA Factbook

70% of active users(last login less than 72 hours)

Average user spends 3 hours 20 minutes on mixi

per week

Ranked 35th on Alexa worldwide, and 3rd in

Japan

PV growth in 2 years

Google Japan

mixi

Amazon Japan

Users growth in 2 years

0

875,000

1,750,000

2,625,000

3,500,000

04/03 05/03 06/03

Users

Our technologysolutions

The technology behind

•Linux 2.6•Apache 2.0

•MySQL

•Perl 5.8•memcached

•Squid

mod_proxy

mod_perl

diary cluster message cluster

images

other cluster

HOT OBJECTS

memcached

REQUEST REQUEST

Powered by

MySQL

•More than 100 MySQL servers•Add more than 10 servers/month

•Non-persistent connection

•Mostly InnoDB•Heavily rely on the use of DB partitioning (our own solution)

DB replication

•MySQL server load gets heavy•Add more slaves

mod_perl

DB

RE

QU

ES

T

DB

Replicate

QUERY (WRITE)

QUERY (READ)

DB replication•Classic problem with DB replication

100 reads/s

50 writes/s

MASTER

50 reads/s

50 writes/s

50 reads/s

50 writes/s

SLAVES

100 reads/s

50 writes/s

MASTER

25 reads/s

50 writes/s

25 reads/s

50 writes/s

25 reads/s

50 writes/s

25 reads/s

50 writes/s

SLAVES

Some statistics•Diary related tables

•Read 85%•Write 15%

•Message related tables •Read 75%•Write 25%

DB partitioning

•Replication couldn’t keep up anymore

•Try to split the DB

How to split?

DB

message tables

diary tables

other tables

user A user B user C

Splitting vertically by users or splitting horizontally by table types

Vertical partition

DB

message tables

diary tables

other tables

user A user B user C

DB 1 DB 2

Vertical partition

•Too many tables to deal with at one time

•The transition in splitting gets complex and difficult

Horizontal partition

message tables

OLD DB

other tables

diary tables

Also called level 1 partitioning within mixi

message tables

NEW DB

$dbh = $db->load_dbh(type => “message”);

$dbh = $db->load_dbh();

diary tables

NEW DB

$dbh = $db->load_dbh(type => “diary”);

Partition map for level 1

•Small and static•Just put it in configuration file

•For example:$DB_DIARY = ‘DBI:mysql:host=db1;database=diary’;$DB_MESSAGE = ‘DBI:mysql:host=db2;database=message’;...

Easy transition

OLD DB NEW DB

mod_perlW

RITE

READ

WRITE

1 Writes to both DBs

SELECTINSERT IGNORE

2 Copies in background

READ

3Shifts reads

Problems with level 1

•Cannot use JOIN anymore• Use FEDERATED TABLE from MySQL 5

• Or do SELECT twice which is faster than using FEDERATED TABLEs

• If table is small, just duplicate it

Next step

•When the new DB gets overloaded

•We split the DB, yet again•Get ready for level 2

message tables

user id

Partitioning key

•user id, message id•Choose wisely!

user A user B

message id

message tablesor

Level 2 partition

message tables

LEVEL 1 DB

user A user B user C user D

message tables

NODE 1NEW DB message tables

NODE 2

Partition map for level 2

•Big and dynamic•Cannot put it all in configuration file

Partition map for level 2

•Manager based• Use another DB to do the partition

mapping

•Algorithm based• Partition map is counted inside

application

• node_id = member_id % TOTAL_NODE

Manager based

message tables

NODE 1

message tables

NODE 2

message tables

NODE 3

MANAGER DB

mod_perl

user_id=14

1 Asks for node_id

node_id=22 Returns node_id

3 Connects to node

Algorithm based

message tables

NODE 1

message tables

NODE 2

message tables

NODE 3

mod_perl

node_id=(user_id%3)+1node_id=3

1 Computes node_id

number of nodes = 3

2 Connects to node

Manager based

•Pros:• Easy to manage

• Add a new node, move data between nodes

•Cons:• This process increases by 1 query for

partition map

• It needs to send a request to the manager

Algorithm based

•Cons:• Difficult to manage

• Adding new nodes is tricky

•Pros:• Application servers can compute node id

by themselves

• Bypass the connection to the manager

Adding nodes is tricky

mod_perl

NODE 1

NODE 2+

NODE 3

NODE 4

READWRITE

2 Writes to both DBsif node_id is different

old_node_id=(member_id%2)+1

WRITE

number of nodes = 2

new_node_id=(member_id%4)+1

number of nodes = 41 Adds a new application logic C

OP

Y

CO

PY

3 Copies in background

READ4 Shifts reads

Problems with level 2

NODE 1member tables

NODE 2member tables

NODE 3member tables

• Too many connections to different DBs

• Fortunately, on mixi, the majority are small data sets

• Cache them all by using distributed memory caching

• We rarely hit the DB

NODE 1community tables

NODE 2community tables

• Average page load time is about 0.02 sec*

* depending on data sets average load time may vary

Caching

•memcached• Also used in LiveJournal, Slashdot, etc

•Install server on mod_perl machine

•39 machines x 2 GB memory

Summary of DB partitioning

•Level 1 partition (split by table types)

•Level 2 partition (split by partitioning key)•Manager based•Algorithm based

LEVEL 1message tables

1 Split by table types

Summary of DB partitioninguser A user B user C

message tables

OLD DB

other tables

diary tables

message tables

LEVEL 2

2

message tables

LEVEL 2

Split by partitioning key

Image Servers

Statistics

•Total size is more than 8 TB of storage

•Growth rate is about 23 GB / day•We use MySQL to store metadata only

Two types of images

•Frequently accessed images• Number of image files is relatively small

(about a few million files)

• For example, user profile photos, community logos

•Rarely accessed images• About hundred millions of image files

• Diary photos, album photos, etc

Frequently accessed images

•Few hundred GBs of files•Distribute via the use of FTP and Squid

•Third party Content Delivery Network

Frequently accessed images

mod_perl Storage

Squid CDNsto1.mixi.jp sto2.mixi.jp

UPLOAD

1 Uploads to storage

2 Pull images from storage

Rarely accessed images

•Few TBs of files•Newer files get accessed more often

•Cache hit ratio is very bad

•Distribute directly from storage

Uploading rarely accessed images

mod_perl

MANAGERDB

Storagesto1.mixi.jp

Storagesto2.mixi.jp

Storagesto3.mixi.jp

Storagesto4.mixi.jp

abc.gif

1 Assigns a id for an image file

area_id=1,2

2 Arranges a pair of area_id

UPLOAD

UPLOAD

3 Uploads image to storage

Viewing rarely accessed images

Storagesto1.mixi.jp

Storagesto2.mixi.jp

Storagesto3.mixi.jp

Storagesto4.mixi.jp

User

mod_perl

MANAGERDB

Asks for view_diary.pl

1

2 Detects abc.gif in view_diary.pl

abc.gifAsks for area_id 3

area_id =1

4 Returns area_id

Creates image URL

5

Returns view_diary.pland URL for abc.gif

6

Asks for abc.gif7

Returns abc.gif8

To do

•Try MySQL Cluster•Try to implement better algorithm• Consistent hashing?

• Linear hashing?

•Level 3 partitioning?• Split again by timestamp?

Questions?

Thank you

•Further questions to bkesuma@mixi.co.jp

•We are hiring :)•Have a nice day!