Git Internals

56
GIT Internals Pedro Melo <{mailto,xmpp}:[email protected] >

description

An explanation about the organization of a Git repo, the type of objects it contains inside and the relations between them.

Transcript of Git Internals

Page 1: Git Internals

GIT InternalsPedro Melo <{mailto,xmpp}:[email protected]>

Page 2: Git Internals

A short GIT History

• 2002 ⇒ Apr 2005: The BitKeeper Wars

• Apr 2005: Episode IV - A New Hope

• July 2005: Hamano is the new maintainer

• Late 2008: GitHub hits the spotlight

“I’m an egotistical bastard, and I name all my projects after myself. First Linux, now git.” – Linus

Page 3: Git Internals

A short GIT History

• 2002 ⇒ Apr 2005: The BitKeeper Wars

• Apr 2005: Episode IV - A New Hope

• July 2005: Hamano is the new maintainer

• Late 2008: GitHub hits the spotlight

“I’m an egotistical bastard, and I name all my projects after myself. First Linux, now git.” – Linus

Personal take on

Page 4: Git Internals

GIT rules (without !!)

• Track content, not changes

• Simple repository

• Complex software

• Its easier to update the software, complex to update all the repos so far

Git Mantra: http://bit.ly/git-phylosophy

Page 5: Git Internals

In other words, I'm right. I'm always right, but sometimes I'm more right than other times. And dammit, when I say "files don't matter", I'm really really Right(tm).

Linus

Page 6: Git Internals

Strong Points

• Non-Linear development

• Distributed Development

• Centralized development is a subcase

• Efficiency

• Toolkit Design

Page 7: Git Internals

Objects

• Git repositories store objects

• Stored in the Object Database

• Inside the Git directory

• .git at the root of your project

• Four major object types

• Objects are compressed for storage (zlib)

• SHA1 of header+content ⇒ ID

Page 8: Git Internals

The Blob

• Files are stored as blobs

• Only content, no metadata

Page 9: Git Internals

blob [content_size]\0Your content goes here after the header

I like pizza with apples

Meet the blob

Page 10: Git Internals

The tree

• Trees store directories

• Mode, type, pointer and name

• Recursive, trees can contain trees

• Stored as a simple text file

Page 11: Git Internals

tree [content_size]\0100644 blob b5f21a README 100644 blob afe433 Makefile.PL 040000 tree a42cd0 lib

Meet the tree

Page 12: Git Internals

The commit

• The object that makes history

• Pointer to a tree and the parent(s) commits if any

• Author, committer and commit message

Page 13: Git Internals

commit [content_size]\0tree 23edfcauthor Pedro Melo <[email protected]> 1243036800committer Pedro Melo <[email protected]> 1243036800

commit without a parent

usually called first commit

Meet the commit...

Page 14: Git Internals

commit [content_size]\0tree fde45cparent 3454dfauthor Pedro Melo <[email protected]> 1243036932committer Pedro Melo <[email protected]> 1243036932

and we fixed that nasty bug

after all, they do tend to crop up

...and its child the other commit

Page 15: Git Internals

The tag

• A name for a particular commit

• Can contain a message

• Optionally GPG signed

• Allows for cryptographically secure releases

Page 16: Git Internals

tag [content_size]\0object 123fectype committag v1tagger Pedro Melo <[email protected]> 1243037423

made it to 1.0!

Meet the tag

Page 17: Git Internals

Git Data Model Recap

• Immutable objects

• A file per object

• Repacked into object packs for efficiency

• Organized as a directed acyclic graph

Page 18: Git Internals
Page 19: Git Internals

proj/ Makefile.PL lib/ Cool.pm

Page 20: Git Internals

proj/ Makefile.PL lib/ Cool.pm

Page 21: Git Internals

Cool.pm

proj/ Makefile.PL lib/ Cool.pm

Page 22: Git Internals

Cool.pm

proj/ Makefile.PL lib/ Cool.pm

Page 23: Git Internals

Cool.pm

proj/ Makefile.PL lib/ Cool.pm

Makefile.PLlib/

Page 24: Git Internals

Cool.pm

proj/ Makefile.PL lib/ Cool.pm

Makefile.PLlib/

Page 25: Git Internals

Cool.pm

proj/ Makefile.PL lib/ Cool.pm

Makefile.PLlib/

Page 26: Git Internals

Cool.pm

proj/ Makefile.PL lib/ Cool.pm

Makefile.PLlib/

Page 27: Git Internals

Cool.pm

proj/ Makefile.PL lib/ Cool.pm

Makefile.PLlib/

Makefile.PLlib/

Page 28: Git Internals

Cool.pm

proj/ Makefile.PL lib/ Cool.pm

Makefile.PLlib/

Makefile.PLlib/

Page 29: Git Internals

Cool.pm

proj/ Makefile.PL lib/ Cool.pm

Makefile.PLlib/

Makefile.PLlib/

Page 30: Git Internals

Cool.pm

proj/ Makefile.PL lib/ Cool.pm

Makefile.PLlib/

Makefile.PLlib/

Cool.pm

Page 31: Git Internals

Cool.pm

proj/ Makefile.PL lib/ Cool.pm

Makefile.PLlib/

Makefile.PLlib/

Cool.pm

Makefile.PLlib/

Page 32: Git Internals

Cool.pm

proj/ Makefile.PL lib/ Cool.pm

Makefile.PLlib/

Makefile.PLlib/

Cool.pm

Makefile.PLlib/

Page 33: Git Internals

Cool.pm

proj/ Makefile.PL lib/ Cool.pm

Makefile.PLlib/

Makefile.PLlib/

Cool.pm

Makefile.PLlib/

Page 34: Git Internals

References

• “Names” for commits

• Mutable, they point to a specific commit and move to a new one after each commit

• A branch is a reference, a name to a commit

• Special HEAD reference: points to a reference

Nam

e

Page 35: Git Internals

Cool.pm

proj/ Makefile.PL lib/ Cool.pm

Makefile.PLlib/

Makefile.PLlib/

Cool.pm

Makefile.PLlib/

Page 36: Git Internals

mas

ter

HEA

D

Page 37: Git Internals

mas

ter

HEA

D

Page 38: Git Internals

mas

ter

HEA

D

test

Page 39: Git Internals

mas

ter

HEA

Dte

st

Page 40: Git Internals

mas

ter

HEA

Dte

st

Page 41: Git Internals

mas

ter

HEA

D

test

Page 42: Git Internals

mas

ter

HEA

D

test

Page 43: Git Internals

mas

ter

test

Merge

Page 44: Git Internals

mas

ter

test

Merge

Page 45: Git Internals

mas

ter

test

Rebase

Page 46: Git Internals

mas

ter

test

Rebase

Page 47: Git Internals

mas

ter

test

Rebase

Page 48: Git Internals

mas

ter

test

Rebase + Merge

Page 49: Git Internals

mas

ter

test

Rebase + Merge

Page 50: Git Internals

Non-SCM uses for Git

• Leverage strengths

• immutable

• over network pulls only missing objects

• fast checkout (compare to copy, less to read)

• easy rollback

Page 51: Git Internals

Beware of weak points

• Always stores full copy of files

• not good for backups of DB dumps

• Full history ⇒ more disk space

• this might chance as “shallow clones” gain funcionality...

Page 52: Git Internals

Content distribution

• Updates done in a master, central repository

• Hierarchy of slave repositories

• Fast sync between repositories, fast checkout

• Can be automated with hooks

• Useful if you have lots of static files, faster than rsync

Page 53: Git Internals

Read-only filesystem

• Design web server that fetch objects directly from the object database

• Compact storage, efficient retrieval

• Packs of objects also very VM friendly, mmap ready

• Some solutions already available OSS

Page 54: Git Internals

Wiki/Ticketing backend

• Use git repository as storage for wiki or ticketing systems

• Good match for distributed developement

• Several solutions already available OSS

• ... but similar to SCM usages

Page 55: Git Internals

That’s all folks!

• I’ll be around #codebits, feel free to ask me stuff

• If you want a git as a SCM demo, lets get organized and I’ll do a impromptu presentation, or even private lapdan^H^H^H^H^Hdemos

• After #codebits <{mailto,xmpp}:[email protected]

Page 56: Git Internals

About Git

About Mehttp://simplicidade.org/notes/

@pedromelo{mailto,xmpp}:[email protected]

skype:melopthttp://github.com/melo

http://www.slideshare.net/melopt

http://git-scm.com/Git Internals: http://peepcode.com/products/git-internals-pdf

Git book: http://progit.org/