A tutorial on version control and collaboration using Git · $ git init $ git status The init...

29
A tutorial on version control and collaboration using Git Arunkumar Bupathy Jawaharlal Nehru Centre for Advanced Scientific Research [email protected] August 2, 2019 1 / 29

Transcript of A tutorial on version control and collaboration using Git · $ git init $ git status The init...

Page 1: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

A tutorial on version control and collaboration using Git

Arunkumar Bupathy

Jawaharlal Nehru Centre for Advanced Scientific Research

[email protected]

August 2, 2019

1 / 29

Page 2: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

Overview

1 Introduction - What is Version Control and Why?

2 Preliminaries - Installing and Setting up Git

3 Basic Usage - Creating a repository, tracking and reverting changes

4 Advanced Usage - Remote Repository, Branching, Merging andCollaborating

5 Important Information

2 / 29

Page 3: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

Introduction to Version Control

Version control is the process of managing multiple versions of one ormore files with the ability to track and revert changes if needed.

Codes and documents often involve multiple versions with lots ofchanges.

You are probably doing some version control already - folders, tag anddate based names, snapshots, etc.

These methods are error prone - easy to overwrite wrong versions ordelete files.

Hard to keep track of and manage, especially when not done carefully.

Difficult to tag and date things in the first place - either because weare lazy or we are scrambling to get things done and do not bother totag/date files later.

3 / 29

Page 4: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

Introduction to Version Control

The idea of using a version control software (VCS) is to formalize thisand make you systematic.

A version control software makes you to do things in a certain way sothat version control is a lot easier.

The kind of version control software we will talk about does *not*perform this automatically.

Some good reasons why not to automate this:1 Can lead to too many snapshots which can be difficult to sift through.2 A “visual diff” is often the only way of knowing what the changes are.3 When collaborating, no tight control over who can modify or overwrite.4 It also makes you lazy.

VCS offer some advanced functionality which is painful to achievemanually - merging different file versions, multiple developmentbranches, easy back and forth movement in history, and more.

Once you get the hang of it, it will become second nature.

4 / 29

Page 5: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

Git - A Distributed Version Control System

Git is the name of the software that will aid us in version control.

Github/Gitlab are service providers who offer cloud based hosting forgit projects.

Git can be used without any host - you can store your repositories onyour local disk.

It is a distributed version control system, meaning every contributorhas a complete copy of the project history. (Can work offline, andmultiple redundant copies!)

5 / 29

Page 6: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

Getting Git and Setting It Up

For linux based distributions, git should be easily available throughofficial software repositories. For Ubuntu and derivatives, this wouldbe: sudo apt install git

For MacOS, the latest version of git can be downloaded from:https://sourceforge.net/projects/git-osx-installer/files/

Once done, we need to set up some basic things: username, email,preferred text editor (optional), and diff-tool (optional).

In a terminal, type (without the $):

$ git config --global user.name "yourusername"

$ git config --global user.email "[email protected]"

$ git config --global core.editor emacs

$ git config --global diff.tool kdiff3

I recommend kdiff3 or meld as good tools for diff/merge.

This is a one-time setup, so worry no more about these.

6 / 29

Page 7: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

First Git Project

In the terminal, make a folder for our first git-tracked project (whichcan be anything) and create a text file:

$ mkdir first_git_project && cd first_git_project

$ echo "This is a text file." > example.txt

Before we can start tracking changes to this file, we need to initializegit, and check if git is running:

$ git init

$ git status

The init should be done inside the project folder! You should see gitsay that there is one untracked file.

Important: To track changes to a file you need to tell git to do so:

$ git add example.txt

$ git status

Alternatively, git add --all adds all the files inside the currentfolder to be tracked. Try to read what git status says.

7 / 29

Page 8: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

Commits and Commit Messages

The first time you add any file to your project, it is a good practice to“commit” or store it to the repository, before making further changes:

$ git commit

Git should have opened your favourite editor asking you to input a“commit message” - a brief description of the changes made.

Commits and commit messages are an exteremely important part ofyour project. A good commit message will go a long way in improvingyour version control experience.

A better way of doing this is to include the commit message with thecommit command itself:

$ git commit -m "Added example.txt to the project."

$ git log

The second command prints a log of the things done so far with therepository. Every commit has a unique number/hash value associatedwith it (more on that later).

8 / 29

Page 9: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

Editing a Tracked File

Git is all about tracking changes to files. So let us edit theexamples.txt again.

$ echo "This is an edit." >> example.txt

$ git status

Git should tell you that the file has been modified.

Let us say that we want to keep a snapshot of the file in this state,we do:

$ git add example.txt

$ git commit -m "Modified example.txt"

Notice that git add is not just used to add files to a project. It isalso used to tell git which files are to be part of the next commit.

In git speak, this is called staging. Files that are not staged will notbe committed!

Now try git log --oneline and git status.

9 / 29

Page 10: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

Working with Multiple Files

Let us now add more files to the project.

$ echo "There is nothing useful here." > useless.txt

$ echo "Ramblings about my love for coffee!" > coffee.txt

$ git add --all

$ git commit -m "Added a useless file and another on coffee."

Say, we edit two of these files:

$ echo "Between coffee and tea, I prefer coffee." >> coffee.txt

$ echo "This is another edit." > example.txt

And say, we are not ready to commit “example.txt” yet, but want tocommit “coffee.txt” already.

This is where staging gets useful:

$ git add coffee.txt

$ git commit -m "Modified coffee.txt to show my coffee love."

Now, try editing the “example.txt” further, stage it and commit it.You can use your favorite editor to do the editing.

10 / 29

Page 11: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

How Git Does It

Git stores the snapshots in a specialized directory called .git withinyour project’s folder.

This is hidden away from your view so that things are neat and tidy.

Git has an internal copy of the current commit, and a working copywhich is what you can edit.

Commit 1

Commit 2

Commit 3 Working Copy

master

HEAD

Git Internal Storage User Accessible

Staged Changes

11 / 29

Page 12: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

How Git Does It

There are two pointers, one to keep track of the tip of a branch (thedefault is master) and another for the currently checked out commitcalled HEAD - usually the last commit.

There can be more than one branch (more on that later).

A checked out commit i.e., HEAD is the one you get to work on.

Commit 1

Commit 2

Commit 3 Working Copy

master

HEAD

Git Internal Storage User Accessible

Staged Changes

12 / 29

Page 13: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

How to Revert Changes?

Say you have done a change which you wish you had not. How to goback? For example, you want to go back to a previous version ofexample.txt.

First, find the commit which has the required version of the file, usinggit log

Once you have (more or less) identified the commit, check if you havethe right version of the file (replace <commit-hash> with thecommit’s hash):

$ git difftool <commit-hash> HEAD -- example.txt

If correct, then check out that file:

$ git checkout <commit-hash> -- example.txt

This file is automatically staged for you, so go right ahead andcommit it:

$ git commit -m "Reverted example.txt to the correct state."

13 / 29

Page 14: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

You Can Merge Two Versions As Well

Instead of going back to a previous version, you might want to mergethe previous and the current versions. This is often handy.

As before, use git log to find the commit. Then start the diff tool(not all diff tools have merge capability so this does not applyuniversally):

$ git difftool <commit-hash> HEAD -- example.txt

Use your difftool’s merge capability to merge the appropriate linesfrom both files. (I will demo this for kdiff3).

Make sure you save the merged file to the working directory, using“save as” option to overwrite the working copy.

As usual, stage and commit the file.

14 / 29

Page 15: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

How to Revert the Whole Repository State?

What if you want to revert the entire repository to a previous state?

As before, use git log find the commit which has the required files.

Do git checkout <commit-hash> to go to that commit and check ifthe files are as you want them to be.(* git checkout is a confusing command. See p.26 for clarification.)

Once you are confident of the commit you wish to revert to, go backto the tip of the current branch (master, in this case) by issuing:

$ git checkout master

master is a keyword, no need to use the hash value. Finally to revertto that state, do:

$ git revert <commit-hash>

Instead of going back in history, this command actually takes thegiven commit and applies it as a change on top of current branch.

This behaviour is intentional and essential to prevent permanentlydeleting files from the project history.

15 / 29

Page 16: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

Undoing Changes to the Working Directory

Another case is when you have staged changes, but you want tounstage some the files (may be because you do not want them goinginto the next commit just yet).For example, make some changes to some of the files, and stagethem:

$ echo "I drink two coffees a day." >> coffee.txt

$ echo "Is this a good edit?" >> example.txt

$ git add --all

$ git status

Did you notice that the git status command told you how tounstage a staged file? Let us do just that:

$ git reset HEAD -- example.txt

$ git status

$ git commit -m "Added info about my coffee intake frequency."

The changes you made to example.txt are not lost, just unstaged.

What if you want to discard changes to example.txt? Thegit status command just told you how to do that! Try it out.

16 / 29

Page 17: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

Setting up a Remote Repository

A remote repository is a special git repository (called a barerepository) that is used as a central storage, a kind of a backup.

In the case of multi-users who are collaborating, it acts as a centralsynchronization point. It even works for a single user working frommultiple machines.

It can be in your local disk, back up disk, usb drive, the cloud (don’tautosync though!), etc. Let us create one for our project.

Open a separate terminal, navigate to your home folder (forillustration only, this can be anywhere you want.)

We will now make the remote repository (with a .git suffix toremind ourselves that it is a remote repo):

$ mkdir first_git_project.git

$ cd first_git_project.git

$ git init --bare

Our bare repository is set up. You can now close this terminalwindow. There is a bit more configuring to do though.

17 / 29

Page 18: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

Setting up a Remote Repository

Now navigate back to our project folder. We are going to configureour project to use the remote we just set up in the previous step:

$ git remote add origin /path/to/remote/first_git_project.git

$ git remote -v

Here origin is just a name to identify the remote repository. Thesecond command is just to verify the remote repository.

You must be careful when providing the path to the remoterepository. Always give absolute paths, not relative paths.

Let us try to push our repository to the remote, specifically themaster branch:

$ git push origin master

This is probably a good place to tell you to use separate remoterepositories for each of your projects. Also, the remote is not to bemanually edited ever.

18 / 29

Page 19: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

Cloning from a Remote Repository

The remote is now fully functional and other people (or yourself onanother machine) can clone the project from it.

Open a separate terminal window, make a temporary folder, and tryto clone from our remote repository:

$ cd $HOME && mkdir tmp && cd tmp

$ git clone /path/to/remote/repo/first_git_project.git

$ cd first_git_project

$ git log --oneline

Voila! We have successfully cloned the repository.

For the rest of the tutorial, we are going to pretend that thistemporary folder is actually a different user.

19 / 29

Page 20: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

Fetching and Merging Changes in the Remote

Let us make an edit here (in the temporary folder) and push it to theremote repository:

$ echo "Working from different location." >> example.txt

$ git add example.txt

$ git commit -m "Added line on working remotely to example.txt"

$ git push origin master

Now we will try to fetch and merge this to our original project.

In the directory where the original project is, do:

$ git fetch origin

$ git merge origin/master

$ git log --oneline

We should see the changes made in the temporary folder reflectedhere as well.

The fetch command fetches all the changes that have been pushed tothe remote, but does not actually merge them to your working project.

20 / 29

Page 21: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

File Conflicts

An often encountered error is a file conflict. This happens when youedit the same parts of the same file from two different machines (or itcan be two different users).

Let us simulate such a conflict, by editing example.txt in the twocopies of the project that we have.

Edit the same line, but put two different strings. For instance, editthe following line (if you have it) in example.txt as:

Working from different location.

---> Working xxxx xxxx location. (in the original)

---> Working adsf asdf location. (in the temp folder)

Commit the changes in both places with different commit messages,and push one of them to the remote.

$ git push origin master

Now go over to the other folder, and try pushing the changes toremote. Git would have failed and issued some messages.

21 / 29

Page 22: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

Conflict Resolution

Although git suggested to do git pull, we are going to do this:

$ git fetch origin

$ git merge origin/master

Again, git would have failed to merge and complained that there aremerge conflicts.

We need to find and fix it. git status will tell where to begin.

In our case, the problem is in example.txt. Open it in a text editor,and search for <<<<<<< or HEAD.

You should see that git has kept the conflicting line from both theversions for you to decide.

The lines above and below ======= correspond to the lines from thecurrent working directory and the remote repository, respectively.

Edit the file the way you want it to be and save it (Don’t forget toremove <<<, ===, >>>, HEAD, origin/master, etc.) Then asusual, stage the file and commit it.

22 / 29

Page 23: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

Branches

Sometimes you do not want to work on the master branch. You wantto keep your edits (which may be experimental) separate from themaster branch (which you consider stable).

Especially when collaborating, everyone except the owner of theproject should use a separate branch for their changes, which theowner can later merge with their master branch.

Let us create a branch called experimental (can be anything) andswitch to it:

$ git branch experimental

$ git checkout experimental

Now you can continue editing and commiting files to experimental:

$ echo "Is this better?" >> example.txt

$ git add example.txt && git commit -m "An experimental edit."

You can also switch (i.e, checkout) back to the master branch and doedits independently of the other branch.

23 / 29

Page 24: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

Merging Branches

Once you are satisfied with your experimental edits, or a collaboratorhas pushed over a branch with his/her edits, you could merge these.Make sure you are on the main branch into which you want theexperimental branch merged, by checking out!$ git checkout <main-branch>

To merge a branch to the currently checked out branch:$ git merge <experimental-branch>

Resolve merge conflicts if any as usual.

24 / 29

Page 25: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

Collaborating

Since this is already a long tutorial, I wish to stop here.

If you are interested, we can go over to github, create an account andfollow a tiny tutorial there.

They have further tutorials to get you started with collaboration. Butall the concepts learned here will naturally carry over.

Thanks for your patience!

I would like to acknowledge Prof. Stefano Allesina, Dept. of Ecology andEvolution, University of Chicago, for his course on computational tools forscientists, where I learned some of this.

25 / 29

Page 26: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

Important Git Information

git add does two things. The first time you use it on a file, it startstracking the file. Subsequent uses stage that file to be committed.

git checkout does three different things depending on thearguments provided to it. (Click here to go back to p.15)

In the first mode, it checks out a given file from a given commit tothe working directory:

$ git checkout <commit-hash> -- <filename>

In the second mode, it moves the HEAD pointer to a given commit,allowing you to inspect the files in that state:

$ git checkout <commit-hash>

You cannot edit in this so called “HEAD detached” state. But youcan create a branch from here and then perform edits on that branch.

In the third mode, it switches from the current branch to a givenbranch:

$ git checkout <branch-name>

26 / 29

Page 27: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

Important Git Information

git merge does two different things depending on the arguments.

In the first mode, it merges a given branch to the current branch:

$ git merge <branch-name>

In the second mode, it merges a recently fetched remote branch tothe current branch:

$ git merge <remote-name>/<branch-name>

When checking out between commits or branches make sure that youstage or commit changes in the working directory to the repository!

Don’t be a commitment-phobe! Commit often and make goodcommit messages.

Don’t edit too many files before commiting; Will be hard to come upwith a good commit message.

Git is not designed to handle very large files. So don’t store your rawdata like particle configurations.

27 / 29

Page 28: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

Important Git Information

When “diff”ing two file versions, the command is actually:

$ git difftool <old-commit> <new-commit> -- <file-to-compare>

The use of -- tells git that what follows is a file-name (not a commit).

The two compared versions can be anything, including HEAD. But aneat way of referring to previous versions is as:

$ git difftool HEAD^ HEAD -- file-to-compare

$ git difftool HEAD^^ HEAD -- file-to-compare

$ git difftool HEAD~2 HEAD -- file-to-compare

The three commands compare HEAD with the a commit one versionback, 2 versions back and 2 versions back, respectively. (The symbolon the last command is a tilde).

You can also use git diff, git’s internal command line diff tool, ifyou do not have a visual diff tool.

The first 7 characters of a commit’s hash are enough to refer tothem. git log --oneline gives you exactly this!

28 / 29

Page 29: A tutorial on version control and collaboration using Git · $ git init $ git status The init should be done inside the project folder!You should see git say that there is one untracked

Important Git Information

You will want to create aliases to longer git commands, like this:

$ git config --global alias.aliasname "gitcommand"

For example:

$ git config --global alias.shortlog "log --oneline"

$ git config --global alias.unstage "reset HEAD --"

You can then simply use:

$ git shortlog

$ git unstage <filename>

The git log command can output something more pretty and lots ofother useful things.

For instance, git log --pretty=format:"%h - %an, %ar: %s"

outputs a short commit hash, author name, relative commit-date andthe commit message.

The %h and so on are keywords to print different things. More can befound using man git-log.

29 / 29