How we scaled git lab for a 30k employee company

73
How we scaled GitLab for a 30k-employee company Minqi Pan

Transcript of How we scaled git lab for a 30k employee company

Page 1: How we scaled git lab for a 30k employee company

How we scaled GitLab for a 30k-employee company

Minqi Pan

Page 2: How we scaled git lab for a 30k employee company

Hello, I’m Minqi Pan

github.com/pmq20

twitter @psvr

Page 3: How we scaled git lab for a 30k employee company

What’s GitLab?

Page 4: How we scaled git lab for a 30k employee company

GitLab

a git-boxinstalled on-premises

Page 5: How we scaled git lab for a 30k employee company

GitLab

HTTP 80/443

SSH 22

Page 6: How we scaled git lab for a 30k employee company

GitLab

HTTP 80/443

SSH 22

Page 7: How we scaled git lab for a 30k employee company

GitLab

RedisMySQL File System

Page 8: How we scaled git lab for a 30k employee company

What’s inside?

GitLab

Page 9: How we scaled git lab for a 30k employee company

NGINX OpenSSH Server

Unicorn gitlab-shellGitlab Workhorse

git

gitlab_gitrails sidekiq rugged

libgit2

Page 10: How we scaled git lab for a 30k employee company

Works great for small teams

Page 11: How we scaled git lab for a 30k employee company

However

Page 12: How we scaled git lab for a 30k employee company

to make it easy to do business anywhere

Page 13: How we scaled git lab for a 30k employee company

Let’s scale it!

Page 14: How we scaled git lab for a 30k employee company

GitLab

HTTP 80/443

SSH 22

Page 15: How we scaled git lab for a 30k employee company

HTTP 80/443

SSH 22

unicorn unicorn

unicorn …

Page 16: How we scaled git lab for a 30k employee company

HTTP 80/443

SSH 22

unicorn unicorn

unicorn …

nginx ?

Page 17: How we scaled git lab for a 30k employee company

HTTP 80/443

SSH 22

unicorn unicorn

unicorn …

nginx

ssh2httphttps://github.com/pmq20/ssh2http

Page 18: How we scaled git lab for a 30k employee company

unicorn unicorn

unicorn …

LVS (IPVS)

HTTP 80/443

SSH 22

Page 19: How we scaled git lab for a 30k employee company

Linux Virtual Server(IP Virtual Server)

• transport-layer load balancing inside kernel

• layer-4 switching, unlike nginx (layer-7)

• can: IP weighting, IP blocking, health checking

• can’t: HTTP 200 Health Checking, URL rewriting

Page 20: How we scaled git lab for a 30k employee company

Complications

• SSH Host Key Synchronisation: do it once

• SSH Client Key Synchronisation: do it every time

• synchronised via redis pub-sub

Page 21: How we scaled git lab for a 30k employee company

Does it scale in the backend?

Page 22: How we scaled git lab for a 30k employee company

IV. Backing services Treat backing services as attached resources

Page 23: How we scaled git lab for a 30k employee company
Page 24: How we scaled git lab for a 30k employee company

🤔 Redis

🤔 MySQL

🤔 File System

GitLab

* git repositories * user generated attachments / avatars

Page 25: How we scaled git lab for a 30k employee company
Page 26: How we scaled git lab for a 30k employee company

GitLab Geo• introduced in GitLab 8.5 EE

• 1 Master N Slave Replication

• achieves A-P in C-A-P theorem

• no disaster recovery

• no sharing

Page 27: How we scaled git lab for a 30k employee company

HTTP 80/443

SSH 22

nginx

ssh2http

routing via key namespace/repo_name

GitLab shard

FS shard

GitLab shard

FS shard

GitLab shard

FS shard

Page 28: How we scaled git lab for a 30k employee company

GitLab Sharding• Introduces Sidekiq sharing as well

• Introduces many changes to the application layer as well- need to have super user authentication - need to eliminate every page with requests across shards (e.g. admin page of repo sizes)

• Tedious changes on the application level.

Page 29: How we scaled git lab for a 30k employee company

How to deal with FS?

• 🤔 Hardware Network-Attached Storage?

• 🤔 Software Network-Attached Storage?

• 🤔 Remote Procedure Calls to FS shards?

• 🤔 Kill it?

Page 30: How we scaled git lab for a 30k employee company

• Hard-NAS: Alibaba has non-IOE policies.

• Soft-NAS: Alibaba does not have it yet.

• RPC: GitRPC? Good. GitHub does that.

• Kill FS: Use the cloud. Try something new!

Page 31: How we scaled git lab for a 30k employee company

by “cloud” we mean…

• Amazon S3: Amazon Simple Storage Service

• Alibaba OSS: Alibaba Object Storage Service

Page 32: How we scaled git lab for a 30k employee company

libgit2 git grit• used in wiki’s • via gollum-lib • via gollum-grit_adapter • eliminate-able via

gollum-rugged_adapter

gitlab-rails

Page 33: How we scaled git lab for a 30k employee company

gitlab-rails

libgit2 git• via gitlab_git • via rugged • backend

replace-able

• via gitlab-shell • via gitlab-workhorse • via popen • backend

hard-to-replace (FS)

grit

Page 34: How we scaled git lab for a 30k employee company

Basic Idea

Page 35: How we scaled git lab for a 30k employee company

gitlab-workhorsegitlab-rails gitlab-shell

git

libgit2

Cloud Based Backend

grit

Page 36: How we scaled git lab for a 30k employee company

Cloud Based Backend

Page 37: How we scaled git lab for a 30k employee company

odb’s refdb• stored via OSS • locked via redis

hi-priority

lo-priority

loose OSS store

packed OSS store

Page 38: How we scaled git lab for a 30k employee company

OSS refdb (read)

Page 39: How we scaled git lab for a 30k employee company

OSS refdb (write)

Page 40: How we scaled git lab for a 30k employee company

loose OSS store (write)

Page 41: How we scaled git lab for a 30k employee company

loose OSS store (read)

Page 42: How we scaled git lab for a 30k employee company

packed OSS store (write)

Page 43: How we scaled git lab for a 30k employee company

packed OSS store (read)

via HTTP “Range” header

Page 44: How we scaled git lab for a 30k employee company

packed OSS store (read)

Page 45: How we scaled git lab for a 30k employee company

Example

• First byte of the name is 0x9f

• IDX[8 + (0x9f - 1) * 4] == 0x0403 == 1027

• IDX[8 + 0x9f * 4] == 0x0403 == 1029

• Object No. 1027 ~ 1029

Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9

Page 46: How we scaled git lab for a 30k employee company

Example

• Binary search 1027 ~ 1029

• Found at 8 + 4 * 256 + 1027 * 20 == 21572

• Skip the rest total_num*(20+4) == 1628*24

Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9

Page 47: How we scaled git lab for a 30k employee company

Example

• IDX[8 + 4 * 256 + 1628*24 + 4 * 1027]

Read 9fcf811e00fa469688943a9152c16d4ee90fb9a9

• PACK[0x0004482D] == PACK[280621]

Page 48: How we scaled git lab for a 30k employee company

ExampleRead 9fcf811e00fa469688943a9152c16d4ee90fb9a9

E3 11100011 1_______ => MSB 1 continue _110____ => type == 6 == OFS_DELTA ____0011 => length == 3

3-bit type, (n-1)*7+4-bit length

Page 49: How we scaled git lab for a 30k employee company

ExampleRead 9fcf811e00fa469688943a9152c16d4ee90fb9a9

• PACK[0x0004482D]

01 00000001 0_______ => MSB 0 break _0000001 => length += (1 << 4)

final length == 19

Page 50: How we scaled git lab for a 30k employee company

ExampleRead 9fcf811e00fa469688943a9152c16d4ee90fb9a9

• PACK[0x0004482D]

AA 10101010 1_______ MSB 1 continue _0101010 base offset == 42

Page 51: How we scaled git lab for a 30k employee company

ExampleRead 9fcf811e00fa469688943a9152c16d4ee90fb9a9

• PACK[0x0004482D]

44 01000100 0_______ MSB 0 break _1000100 offset == ((42+1)<<7)+68 == 5572

Page 52: How we scaled git lab for a 30k employee company

ExampleRead 9fcf811e00fa469688943a9152c16d4ee90fb9a9

offset == 5572push 0x0004482D into stackdeal with (0x0004482D - 5572)push (0x0004482D - 5572) into stack…root base

Page 53: How we scaled git lab for a 30k employee company

ExampleSHA1 type size size-pack offset-

pack depth base

9fcf811e00fa469688943a9152c16d4ee90fb9a9

blob 19 32 280621 46110c89446f2281e5db9b798a0fa020fad6e63e1

6110c89446f2281e5db9b798a0fa020fad6e63e1

blob 52 45 275049 33bbeff3fc22b75c1a26f4ab9b64449b33002aea5

3bbeff3fc22b75c1a26f4ab9b64449b33002aea5

blob 2935 1263 273786 2a399208309046656ecc01f7653c5d5b8905fc16e

a399208309046656ecc01f7653c5d5b8905fc16e

blob 4686 1540 272246 1e4e56117de8b3bd0bd899701da4712caee27c7d6

e4e56117de8b3bd0bd899701da4712caee27c7d6

blob 12635 3279 115703 0 -

Page 54: How we scaled git lab for a 30k employee company

git → libgit2

Page 55: How we scaled git lab for a 30k employee company

git fetch / clone

• git upload-pack --advertise-refs(rewritten via libgit2)

• git upload-pack(untouched)

• git pack-objects(rewritten via libgit2 pack builder)

Page 56: How we scaled git lab for a 30k employee company

git push (small data)• git upload-pack --advertise-refs

(rewritten via libgit2)

• git upload-pack(untouched)

• ntohl(hdr.hdr_entries) < unpack_limit

• git unpack-objects(modified via libgit2, writing to loose OSS store)

Page 57: How we scaled git lab for a 30k employee company

git push (big data)• git upload-pack --advertise-refs

(rewritten via libgit2)

• git upload-pack(untouched)

• ntohl(hdr.hdr_entries) >= unpack_limit

• git index-pack(modified via libgit2, writing to packed OSS store)

Page 58: How we scaled git lab for a 30k employee company

Naked Benchmark(no cache)

Page 59: How we scaled git lab for a 30k employee company

Fixture

• Repository: gitlab-ce

• https://gitlab.com/gitlab-org/gitlab-ce.git

• More than 200k objects

• More than 100MB when packed

Page 60: How we scaled git lab for a 30k employee company

git push

• FS-based:6.27s user 1.72s system 14% cpu 53.299 total

• Cloud-based:6.13s user 1.29s system 13% cpu 54.697 total

Page 61: How we scaled git lab for a 30k employee company

git push (delta)

• FS-based:0.09s user 0.07s system 5% cpu 3.059 total

• Cloud-based:0.04s user 0.05s system 3% cpu 2.845 total

Page 62: How we scaled git lab for a 30k employee company

git clone

• FS-based:6.89s user 8.99s system 33% cpu 47.096 total

• Cloud-based:7.08s user 8.12s system 20% cpu 1:14.12 total

Page 63: How we scaled git lab for a 30k employee company

git fetch (delta)

• FS-based:0.14s user 0.13s system 33% cpu 0.806 total

• Cloud-based:0.09s user 0.10s system 1% cpu 16.019 total

Page 64: How we scaled git lab for a 30k employee company

GET /namespace/repo/tree/master

• FS-based:Executing action: show - 74.5 ms

• Cloud-based:Executing action: show - 5877.7 ms

Page 65: How we scaled git lab for a 30k employee company

GET /namespace/repo/tree/master/builds

• FS-based:Executing action: show - 50.0 ms

• Cloud-based:Executing action: show - 4547.0 ms

Page 66: How we scaled git lab for a 30k employee company

Cache

Page 67: How we scaled git lab for a 30k employee company

odb hamburger refdb

• cached via redishi-priority

lo-priority

loose OSS store

packed OSS store

loose FS cache

packed FS cache

Page 68: How we scaled git lab for a 30k employee company

loose FS cache

• cache written whenntohl(hdr.hdr_entries) < unpack_limit in git-unpack-objects

• when reading via loose OSS store

Page 69: How we scaled git lab for a 30k employee company

packed FS cache

• cache written whenntohl(hdr.hdr_entries) >= unpack_limit in git-index-pack

• cache written in git-pack-objects

Page 70: How we scaled git lab for a 30k employee company

redis refdb cache

• cache written when read and cache-miss

• cache expired when refdb got updatede.g. git-receive-pack

Page 71: How we scaled git lab for a 30k employee company

Future Work

Page 72: How we scaled git lab for a 30k employee company

• develop libgit2 backends for AWS S3

• gitlab: favour libgit2, eliminate direct calls to git

• gitlab: add settings to choose backends

• gollum: use rugged as the default

• libgit2: improve performance, e.g. pack builder

Page 73: How we scaled git lab for a 30k employee company

https://github.com/pmq20