Zing Database

Post on 22-Jun-2015

1.373 views 0 download

Tags:

Transcript of Zing Database

Zing Database – Distributed Key-Value Database

Nguyễn Quang NamZing Web-Technical Team

Content

Why

Introduction

Overview architecture

1

3

2

Single Server/Storage4

Distribution5

Introduction

Some statistics:

- Feeds: 1.6 B, 700 GB hard drive in 4 DB instances, 8 caching servers, 136 GB memory cache in used.

- User Profiles: 44.5 M registered accounts, 2 database instances, 30 GB memory cache.

- Comments: 350 M, 50 GB hard drive in 2 DB instances, 20 GB memory cache

Why

Access time

L1 cache reference 0.5 nsBranch mispredict 5 nsL2 cache reference 7 nsMutex lock/unlock 100 nsMain memory reference 100 nsCompress 1K bytes with Zippy 10,000 nsSend 2K bytes over 1 Gbps network 20,000 nsRead 1 MB sequentially from memory 250,000 nsRound trip within same datacenter 500,000 nsDisk seek 10,000,000 nsRead 1 MB sequentially from network 10,000,000 nsRead 1 MB sequentially from disk 30,000,000 nsSend packet CA->Netherlands->CA 150,000,000 ns

by Jeff Dean (http://labs.google.com/people/jeff)

Standard & Real Requirement

- Time to load a page < 200 ms- Read data rate ~12K ops/sec- Write data rate ~8K ops/sec- Caching service/Database recovery time < 5 mins

Existent thing

- RDBMS (MySQL, MSSQL): Write: too slow; Read: so so with a small DB, too bad with a huge DB

- Cassandra (by Facebook): difficult to do operation/maintain, and performance is not so good

- HBase/Hadoop: We use this for log system

- MongoDB, Membase, Tokyo Tyrant, .. : OK! we use these in several cases, but not suitable for all

Overview architecture

ZN

onbl

ocki

ngS

erve

r

MODELRequests API

Disk

CommitlogStorage

(W)

ZiDBStorage

(RW)

LocalDatabase

LRU ICache(RW)

Remote Storage

(RW)Remote system

TCP

Transportlayer

Model(Business)

layerStorage

layer

Memory storage

Persistentstorage

Remotestorage

- Load configuration- Create & manage backend storages- Implement business rules

Server/Storage

ZNonblockingServer

- Based on TNonblockingServer (Apache Thrift)- 185K reqs/sec (original TNonblockingServer is just 45K reqs/sec)- Serialize/Deserialize data- Prevent overload server- Data is not secured while transferring- Protect service from invalid requests

ICache

- Least Recently Used/Time based expiration strategy- zlru_table<key_type, value_type>: hash table data structure- Re-write malloc/free functions instead of using standard malloc/free in glibc to reduce memory fragment- Support dirty-items marking => for lazy DB flush

ZiDB

- Separate into DataFile & IndexFile- 1 seek for a read, 1-2 seeks for a write- IndexFile (hash structure) is loaded onto memory as a mapping file (shared memory) to reduce system call- Write-ahead log to avoid data loss- Data magic-padding- Checksum & checkpoint for repair data- Partitioning DB for easier maintenance

Distribution

Key requirements:- Scalability- Load balance- Availability- Consistency

2 Models:- Centralized: 1 addressing server & multiple storage servers => bottleneck & single-point-of-failure- Peer-peer: Each server includes addressing module & storage

2 Types of routing:- Client routing: Each client itself does the addressing and query data - Server routing: The addressing is done at server

Operation Flows

Business Logic Server

Addressing Server (DHT)

Storage Layer

Storage Node 1ICache ZiDB Storage

Module

Storage Node NICache ZiDB Storage

Module…

(1) Request key

locations(2)

Key locations(3)

Get & Set operations

(4)Operation

returns

* Addressing module is moved into each storage node in Peer-peer model

Addressing:

- Provide key locations of resources- Basically a Distributed Hash Table, using consistent hashing- Hashing: Jenkins, Murmur, or any algorithm that satisfies two conditions: - Uniform distribution of generated keys in the key space - Consistency(MD5, SHA are bad choice since performance)

Addressing - Node location:

Each node is assigned a continuous range of IDs (hashed key)

Addressing - Node location: Golden ratio principle (a/b = 2b/a)

- Init ratio = 1.618- Max ratio ~ 2.6- Easy to implement- Easy for routing from client 2 3

4

5

1

Server 1: 1,2,3Server 2: 4,5,6,7Server 3: 8,9

1

47

3

6

25

8

9

Addressing - Node location: Virtual nodes

- Each real server has multiple virtual nodes on ring- More virtual nodes, more balance of load- Hard to maintain table of nodes

A

A

A

B

B

CAddressing – Multi-layer rings

- Store the change history of system - Provide availability/reconfigurability- Able to put a node on ring manually

* Write: data is located on the highest ring* Read: data is located on the highest ring, then lower rings if not found

Replication & Backup - Each node has one primary range of IDs, and Some secondary range of IDs- Each real node need a backup instance to replace in case it’s down

* Data is queried from primary node, then secondary nodes

Configuration: to find the best parameters to configure DB or to choose the suitable DB type.

- How many read/write per second?- Length Deviation of data: data length is same same or much different each others, - Has updation/deletion data? - How important of data: acceptable loss or not- The old data can be recycled?

Q & A

Contact:Nguyễn Quang Namnamnq@vng.com.vnhttp://me.zing.vn/nam.nq