GIT Add Remote

GIT Add Remote

What GIT Add Remote Is

Version control systems that maintain their repository of committed versions exclusively on a server, requiring you to be online to commit changes, Git works offline by default. For solo projects, this means that you can benefit from powerful version control without having access to a server or needing to set up an account somewhere.

 

But working solo is not really why people come to Git. People usually come to Git because they want to collaborate.

There’s an old saying (which, like the word “bug,” is popularly attributed to Grace Hopper): A ship in port is safe, but that’s not what ships were built for. Right now, all these commits you’ve been making are safe in the proverbial harbor that is your computer. Let’s send them on a voyage.

 

A remote repository—as opposed to a local one on your computer—is a copy of a Git project that lives somewhere else: another computer on your network someone else’s computer somewhere else, an online service like GitHub— anywhere other than the directory you’re looking at right now. In fact, strictly speaking, when I talk about your “local” repository, I’m referring only to the one you happen to be working with right now.

 

You can even ask Git to push and pull changes to a second local copy stored in a different folder on your own computer, and that second copy would be considered a remote.

 

Remotes are one of Git’s most successful abstractions. Unlike branches, which are wholly virtual copies of your project, each remote corresponds to an actual, physical copy of your repository with which you can exchange data. In this blog i explain the GIT Add Remote. Most of the things you’ll need to do to send and receive changes with a remote have been neatly wrapped up into two verbs: push and pull, which do more or less what you’d expect.

 

What is GIT HUB

What is GIT HUB

Git’s decentralized design allows you to push and pull changes between any two computers: if you wanted to, you could push commits from a branch on your computer directly to a branch on your teammate’s computer, and vice versa. And while this seems cool, for most teams it introduces a lot of complexity without a lot of benefit.

 

Instead, many teams share code via Git through what I’ll call the hub model. It’s centralized in a good way: you and your team keep a shared copy of a project on a remote server (the hub), where it’s accessible to everyone on the team.

 

Each team member who joins the project copies (or clones) the project repository to their own computer, makes and commits changes there, and then uses the git push and git pull commands to synchronize their repo with the one stored on the server. There’s nothing special about remote repositories: they’re just instances of the project, stored somewhere accessible so that you can push or pull commits to or from them.

 

In theory, Git doesn’t consider any one repository to be the canonical one for a given project, although in practice most teams have a single shared remote copy (often hosted on GitHub) that they consider the primary one—what Git conventionally calls the origin. As with master branches, what “primary” means is up to you, and the origin remote is what you make of it.

 

The hub model, though, views the origin remote as canonical, and so from the perspective of your team members, your changes aren’t truly checked in until they’re both committed and pushed to the server for others to access.

 

The hub also serves as a reliable backup of the code in the event that a contributor’s own copy of the project gets corrupted or lost somehow, or if someone gets a new computer and needs to pull down a copy of his or her work. Rather than just copy files from one laptop to another, it’s often easiest to re-clone the Git project from the hub to the new machine.

 

This of course presumes that the hub copy is never lost or corrupted, but Git’s decentralized design helps us out here. Although the hub is the most canonical backup copy of your repo, every copy contains the complete history of your project.

 

For work to be truly lost, it would have to disappear from everyone’s computers, which is unlikely to say the least. In the improbable event that the hub becomes compromised, any local repo can be used to spawn a new remote.

 

WHAT LIVES ON THE SERVER?

Git repo

Server-side repos are what are called “bare” repos, consisting only of the actual repository data (old versions, branches) and no working copy (which also means no staging area). A directory containing a bare Git repo is usually marked by appending .git to its name, as in our hypothetical our-website.git.

 

The insides of a bare Git repo directory are virtually identical to what you’d find in the hidden .git directory in your local working directory, with subdirectories for objects, branch pointers, and other stuff Git needs.

 

Our server-side Git repo contains all of the commits that have been pushed to it, as well as its own set of branches. It’s this additional, remote set of branches that can confuse the heck out of newcomers, because while it’s natural for us to assume there’s always a one-to-one relationship between a branch on our computer and one on the server, and while that’s usually how it goes, Git doesn’t require such a relationship and therefore doesn’t enforce it.

 

True to form, the main way Git compels you to deal with this loose coupling between local and server-side branches is by requiring you to be more specific in your commands.

 

For example, to push changes from one of your local branches to its twin on the server, it’s often not enough to say just git push. Git may prefer that you say git push <remotename><branchname>, even if it seems to us like both the remote name and branch name can be inferred from context.

 

WHERE’S THE REMOTE?

remote repository GitHub

A repository’s location relative to your local repository is what qualifies it as a remote. In other words, a remote is ... elsewhere. Where is that, exactly?

 

For most of you, most of the time, your remote repository will live on GitHub. GitHub is the most popular hosting service for Git repositories by such a wide margin that it seems ridiculous to write this blog as if there are alternatives. Even if your team never hosts projects on GitHub, you’re certain to interact with a repo hosted on GitHub at some point in your work.

 

To be sure, GitHub’s service is both very inexpensive—free if your project is open source or at least browsable by the public, with cheap paid plans available if you need private code sharing—and very easy to use.

 

Many other options exist, however: both other hosted services and ways to self-host Git repositories. If you’re not willing or able to manage your own servers, a hosted service like GitHub is the best choice—they do all the heavy lifting so you can focus on your project. But depending on the kind of work you’re doing, or the kind of organization you’re doing it for, you may have to ensure that your source code is stored in-house.

 

Fortunately, although different services may have different tools or interfaces for creating remote repositories, they all function the same way once they’re set up.

 

ADDING YOUR FIRST REMOTE

ADDING YOUR FIRST REMOTE

You can pass a remote’s URL as a parameter to each of the Git commands I just mentioned, which is fine if you only need to push or pull changes once and never again. Most of the time, though, you’ll work with the same remotes over and over again during a project’s lifespan. Instead of referring to remotes by their URLs, you can assign names to each remote you work with, and refer to it by its name instead.

 

At this point, of course, we have our own local copy of the project stored on our computer. But let’s say we also have a remote Git repo (our-website.git) stored on our own server, Git for Humans - Resources, which we’d like to set up as the origin for our project. To do this, we’ll use the git remote add command. Switch back to the Terminal and enter this command:

(master) $: git remote add origin »https://gitforhumans.info/our-website.git

 

I should point out that git remote is a new kind of command for us: one with subcommands. Whereas all the commands we’ve used so far have had just a single, one-word command name (e.g., git commit), all the commands related to configuring remotes are namespaced; that is, they’re all two-word commands starting with remote: remote add, remote rm, and so on.

 

Typing just git remote, with no subcommand, instructs Git to show us a list of all the remotes we’ve added to this project, similar to how git branch shows a list of branches. As you can see, we only have one: origin:

$: git remote

origin

 

Note that if you started out by cloning the project to your computer from a remote server, using the git clone command, you’ll find this step is already done for you. Repositories you clone from a remote always come preconfigured with that remote set as its origin.

 

Just as your project’s primary branch has a conventional name (master), so does its primary remote: origin. (Notice how this simple yet effective naming convention reinforces the “hub” role for the remote repository: semantically, the remote is the origin for your project’s code, and all of your local repos are just satellites orbiting the hub.)

 

Although origin is the conventional name, you can name remotes anything you want. Unless you have a really compelling reason, though, it’s best to stick with convention and go with origin for your project’s primary remote home.

 

Understanding remote URLs

 

Git networking protocols

Git supports three different networking protocols for moving commits and other data across networks: the Git protocol, SSH, and HTTP. In day-to-day practice, all three behave the same way. Git’s protocols differ only in how you authenticate yourself with the server (that is, how you identify yourself and prove that you’re you) and whether it supports reading and writing changes, or just reading.

 

Each of these example URLs refers to the same repo—hello.git, on the server named Git for Humans - Resources—using each of Git’s three protocol options. Most Git hosting services offer at least HTTPS and SSH.

 

SSH (Secure Shell)

Git’s SSH protocol is the exact same one many of us use to log in to remote servers every day. In fact, any SSH server you have access to can probably be used as a host for remote Git repositories. SSH remotes support both reading and writing, and you can use any authentication method SSH supports.

 

While Git doesn’t have a default protocol, per se, SSH is so widely used for securely sharing Git repositories online that it has become a sort of default—a status Git reinforces by not requiring a protocol prefix for SSH URLs. Put another way, if you omit the protocol part of a URL, Git just assumes you mean it’s SSH. GitHub’s longtime default URL format for private repo access (e.g., git@github.com:username/reponame.git) uses SSH.

 

One drawback to the SSH protocol limits its usefulness in today’s open-source ecosystem: it only works for private repositories, because SSH has no way of allowing someone to access resources without authentication. (It is a secure shell, after all.) Therefore, you may have to rely on a different protocol if you want to offer public access to some or all of your repos.

 

Fortunately, most Git hosts offer support for multiple protocols, so you can use HTTP to allow the public to download the latest stuff from your hot new JavaScript framework’s master branch, while using SSH within your team to push commits to that branch.

 

SSH Git remotes, like many SSH servers, support logging in with a username and password, but it’s more common to identify yourself using public key authentication, whereby you generate a unique, secure key pair and upload the public key to your account on a Git server such as GitHub, keeping the private key safe on your own computer.

 

When you access a remote from that server, Git (or rather, SSH, working on Git’s behalf) securely sends your private key, which acts as a kind of ID badge.

 

Git newcomers can find working with key pairs daunting and unfamiliar, but in return for this added complexity we get both security and (beyond the initial setup step) ease-of-use. Because each user generates a unique key pair on their own computer, it’s easy for server administrators to manage precisely who has access to which projects, especially when using hosting services like GitHub or Bitbucket, which offer great tools for managing users and keys.

 

HTTPS

HTTPS remotes

This is, of course, the same HTTP we use to deliver content over the web. These days, many Git hosts (notably including GitHub) have made HTTPS URLs the default, partly because they’re easier to use (you can authenticate HTTPS remotes with a username and password, rather than a SSH key), and partly because they’re more versatile.

 

Whereas SSH must be private, and must allow read and write access to your repositories, HTTPS offers more flexibility. You can allow anyone on the internet to pull down changes from your repo, while restricting push access to members of your own team.

 

Git protocol

Only the Git protocol is unique to Git, but these days it’s rarely used, largely because it’s read-only. This once made it a good choice for serving up public repos (say, on GitHub), and it paired nicely with SSH for projects that needed both public and private access. Today, however, HTTPS is a better choice.

 

Which should you use?

On purely private projects—if you’re working on commercial software, say, rather than on open-source code—SSH is an excellent choice, and the most widely supported. That said, if you want the simplest, most consistent experience, I recommend using HTTPS whenever possible. Though SSH keys aren’t hard to manage, they still aren’t as easy to use as a username and password, and the fact that HTTPS URLs can be made public makes them easier to share.

 

WORKING WITH REMOTE BRANCHES

 git push

This may sound obvious, but the main difference between working with branches and working with remotes is that remotes are on another computer. When you’re working with branches, you’re mainly concerned with managing different versions of your work stored on your own computer, within what I (and Git) call your local copy. With remotes, just as with branches, you’re still managing different versions.

 

In fact, your interactions with remotes will almost always be in the context of a branch. Once you’ve committed a change to a branch on your local repository, you can use git push to submit your copy of that branch—and all the new commits you’ve added—to the server. Whenever you need to refresh your copy of a branch with everyone else’s latest changes, you use git pull.

Let’s look at some examples of how you’ll use these new commands in practice, starting with pushing.

 

Pushing changes

Having worked on our new homepage design for a while, we’ve discovered a bug in some JavaScript we’ve written. Someone else on the team has offered to help fix the problem, but first we need to get our changes into her copy of the project. To do this, we need to push the new-homepage branch from our computer to the server, where our teammate can find and pull from it.

 

The command we need here is git push <remote> <branch>. Again, Git wants us to be explicit here, listing exactly which remote we want to push to (origin), and which branch we want pushed (new-homepage). This is our first time accessing this particular remote, which is password-protected, so Git will prompt us to enter our credentials when we try to push or pull initially:

$: git push origin new-homepage
Username for 'https://gitforhumans.info': ddemaree
Password for 'https://ddemaree@gitforhumans.info':
Counting objects: 8, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (6/6), done.
Writing objects: 100% (8/8), 743 bytes | 0 bytes/s, done.
Total 8 (delta 1), reused 0 (delta 0)
To https://gitforhumans.info/our-website.git
* [new branch] new-homepage -> new-homepage

Git does several things on our behalf when we push changes, and this long, convoluted response tells us about each one. First, in the initial lines after the password prompt, Git packs up and sends our commits over the network:

Counting objects: 8, done.

Delta compression using up to 8 threads.

Compressing objects: 100% (6/6), done.

Writing objects: 100% (8/8), 743 bytes | 0 bytes/s, done.

Total 8 (delta 1), reused 0 (delta 0)

There’s nothing we need to know in this block of text; it’s saying that Git was able to pack up and send our data to the server successfully.

The next line is much more relevant for us: * [new branch] new-homepage -> new-homepage

 

Here, Git tells us that the remote server received our branch called new-homepage, and from it created a new branch on the server, also called new-homepage. Git doesn’t require remote branches to have the same names as their local counterparts. However, for the sake of everyone’s sanity, it’s customary to keep branch names consistent.

 

Pulling changes

 

Git pulled changes

It’s later in the day, and we’ve come back from getting a coffee to find that our teammate has submitted her changes, fixing the bugs in our JavaScript. Now it’s time to get the changes she has committed to the new-homepage branch into our copy of the branch, by updating our branch using git pull <remote> <branch>.

 

Here again, Git asks us to be maddeningly explicit, specifying the remote and branch names:

$: git pull origin new-homepage
remote: Counting objects: 5, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 2), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From https://gitforhumans.info/our-website.git
* branch new-homepage -> FETCH_HEAD
Updating fed3ac5..4f82376
Fast-forward
carousel.js | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

 

As with git pull, the response includes several lines (beginning with remote:) that explain how data is being transferred between the two repos, which isn’t very interesting. Let’s skip past that, to where there is an interesting detail:

From https://gitforhumans.info/our-website.git

* branch new-homepage -> FETCH_HEAD

 

Here, where you might expect Git to say it has pulled changes from the server’s copy of new-homepage to our local copy of the same branch, the little ASCII arrow is instead pointing to something called FETCH_HEAD. To explain this, let me step back a bit and show you how pushes and pulls work behind the scenes.

 

Whenever you push or pull a branch, two things need to happen, both of which are reflected in this response from git pull.

First, Git needs to transfer a bunch of objects (that is, your commits and the files whose changes they’re tracking) to or from the server. All those remote lines cover this part of the process, and the reason I can confidently tell you to ignore them is that it’s exceedingly rare to run into problems there.

 

The riskiest part of sending data between two computers is the possibility of one machine’s data accidentally overwriting the other’s without realizing it, resulting in data loss.

 

One of the most wonderful aspects of Git’s architecture is that it’s virtually impossible for commits to conflict with each other, so sending or receiving objects is extremely safe. The worst side effect is that one copy ends up with too much data, but there’s almost no risk of losing anything.

 

Once all the new commits are safe on your computer, we get to the second part: a merge:

Updating fed3ac5..4f82376

Fast-forward

carousel.js | 2 +-

1 file changed, 1 insertion(+), 1 deletion(-)

 

Because there weren’t any other commits on our side since we handed this branch off to our colleague, Git is able to merge it back in as a simple fast-forward.

 

Git does this elaborate, three-step, copy-and-merging dance in order to ensure the safety of the work we’ve committed to our copy of new-homepage. Although copying a bunch of commits between computers is safe, as we’ve seen, merging branches sometimes creates conflicts that Git can’t resolve on its own.

 

What’s more, even though with git pull we’re asking Git to merge a server-side branch into one of our local ones, when pulling Git actually does all of the merging work on the local side, which means it needs to copy the server’s new-homepage branch to somewhere on our computer before attempting to merge it into our branch. FETCH_HEAD is that somewhere. It’s a temporary branch Git has created as a buffer, for purposes of merging in these newly fetched changes.

 

It’s important to remember that merging is implicitly part of pulling (and, for that matter, pushing). Or, to flip it around, it’s helpful to remember that both pushing and pulling are the remote form of merging. Both commands do the exact same job: they move a branch to another computer, then merge it into another branch.

 

Having pulled in changes from the server, our copy of new-homepage is now up to date, and we can get back to work.

Resolving merge conflicts: remote edition

As we’ve just seen, pulling in remote changes always ends in a merge. And, as we also know, sometimes merges result in conflicts. If anything, pushes and pulls are more conflict-prone than other kinds of merges, because there are frequently more people and changes involved over longer stretches of time. And the risk of conflict is perhaps never greater than with the branch that, in most projects, changes most frequently: the origin’s shared copy of master.

 

In the last blog, I mentioned that it’s a good idea to keep each branch you’re working on that you eventually plan to merge into master updated with the latest changes in master. Put more simply, while working you should pull in the server’s master branch regularly, to reduce the risk of merge conflicts, and to help keep any conflicts that do occur as minimal as possible.

 

The command for this, if you haven’t guessed, is git pull origin master, which works similarly no matter which branch you’re in. Here we’ll try to pull changes from origin/master into our own copy of new-homepage:

(new-homepage) $: git pull origin master

From https://gitforhumans.info/our-homepage.git

* branch master -> FETCH_HEAD

Auto-merging about.html

CONFLICT (content): Merge conflict in about.html

Automatic merge failed; fix conflicts and then commit the result.

 

Oof! Once again, Git has been tripped up by a one-line difference on the About page. Just like when we changed Meghan’s title, a commit on our branch changed some text in the heading (from “About our site” to “Our Team”), while a commit on master changed the surrounding markup. If we open up about.html, we’ll see the conflicting change, surrounded by conflict notation:

><<<<<<< HEAD

<h1 class=”big-heading”>About our site</h1>

=======

<h1>Our Team</h1>

>>>>>>> 4f2d3c939deaf8f2824d2be04cb59b3f8342aedb

The good news is that the process for resolving a merge conflict is exactly the same whether it’s the result of a local git merge, or an attempted git pull. Just like last time, we need to replace all of this with the version of the text we want to end up with in this branch:

<h1 class="big-heading">Our Team</h1>

 

Next, stage and commit the change to resolve the conflict in our local branch.

(new-homepage *) $: git add –A

(new-homepage *) $: git commit –m "Merge origin/ »master into new-homepage, with resolved conflicts"

 

Once this commit is done, our branch is fully up to date with the server’s master. You can now push these changes, including the merge commit we just created, to the server’s copy of this branch, or keep working.

 

While we’re here, let me draw your attention to some new notation that I used in the commit message. Remote branch names often take the form remotename/branchname, as in origin/master (that is, the copy of master that lives on the origin remote) or testserver/bugfix (the bugfix branch on the testserver remote).

 

Although remote branches almost always correspond to (or track) a branch on your own computer, they are technically separate branches, and this slash notation is a good way of distinguishing between the two copies without having to always say, as I did just now, “the copy of master on the origin remote.”

 

Dealing with (push) rejection

Git reject changes

While we’ve continued to work on the design for our new homepage, the teammate who helped us fix some JavaScript earlier has found and fixed another bug in our code. She committed and pushed her bug fix to the remote branch, but got pulled into a meeting before she could let us know she added some changes to our branch.

Meanwhile, we try to push some changes of our own to the branch and this happens:
(new-homepage) $: git push origin new-homepage To https://gitforhumans.info/our-homepage.git
! [rejected] new-homepage -> new-homepage (non-fast-forward)
error: failed to push some refs to ' https://gitforhumans.info/our- homepage.git'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

 

Gulp. What causes Git to reject changes you’re trying to push?

Generally speaking, server-side Git repos don’t have working copies, staging areas, or, for that matter, human users who could help resolve merge conflicts. In fact, the lack of a working copy means remotes generally can’t merge branches together at all if they require more than a simple fast-forward to merge in. The response from git push tells us as much:

! [rejected] new-homepage -> new-homepage (non-fast-forward)

 

Fortunately, this situation is easily fixed by pulling changes down from the server, and then trying to push again. Fast-forwards work by moving a branch’s HEAD pointer from the commit it’s currently on to one of its direct descendants. When you pull in changes, the result is a merge commit—which happens to be a direct descendant of the remote branch’s current head commit, and therefore qualifies for a fast-forward. Boom.

 

Long story short: if you want to avoid this kind of rejection, or any kind of Git shenanigans, always pull before you push to make sure your own local copy is up to date. There’s rarely any harm to pulling changes, and frequently lots of benefit.

 

TRACKING BRANCHES

By default, nothing connects local and remote copies of a given branch. Even though they share the same name, and we know they logically represent the same piece of work, Git doesn’t yet know that our local new-homepage and the server’s new-homepage are in any way related, which is why we always have to tell git pull and git push which remote branches we want to work with. As elsewhere in Git, this need to be explicit can be annoying—but it’s also powerful.

 

You can potentially pull changes into new-homepage from any branch, on any remote. You could run git pull maniks-computer new-homepage-with-sass—where maniks-computer is your colleague Manik’s laptop, and new-homepage-with-sass is a branch converting your CSS styles to Sass—and it would totally work.

 

Having said that, there is value in telling Git when the local and remote versions of a branch are related, by telling Git that a local branch is tracking its remote counterpart. For instance, when a branch is set up for tracking, you can push and pull changes by typing just git push or git pull, with no other arguments. Git will understand what you mean, and do the right thing.

 

The simplest way to set up a tracking relationship is to include the --set-upstream (or -u) option when invoking git push.

(new-homepage) $: git push -u origin new-homepage Branch new-homepage set up to track remote branch new-homepage from origin.

 

Everything up-to-date

You only need to do this once per local branch, and if you forget to do it the first time you push, that’s fine—you can do it any time, even if you have no new changes to push (indicated here by Git telling us everything is up to date).

 

MAKING FETCH HAPPEN

git fetch

Git has one other remote-related command that’s worth talking about. On the surface, git fetch sounds maddeningly similar to git pull. But whereas git pull works to pull down changes for just a single branch, git fetch can pull down everything from an entire remote repository at once.

You’ll notice that when we run git fetch origin, the output is very familiar:

$: git fetch origin
remote: Counting objects: 5, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 2), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From https://gitforhumans.info/our-homepage.git
9eb7cf6..fed3ac5 master -> origin/master

 

First we see the same object-copying gobbledygook we’ve noticed several times already. However, at the bottom you can see that something has happened other than just copying a bunch of data from the server, something different from the merges or fast-forwards we’ve gotten used to. Specifically, Git has saved a copy of the server’s master branch to a special, read-only branch on our local copy called origin/master.

 

Part of git fetch’s job is to allow you to work offline. When I say git fetch works on whole repositories, I mean that: by default, it pulls down a snapshot of every branch in a remote so that you can compare, merge, or do any other sort of work with those branches without needing to be online the whole time.

 

When Git was developed in 2005, before smartphones and airplane Wi-Fi were ubiquitous, if you wanted to work from a café or during a flight, you needed to have pulled down a copy of everything on to your computer. But you would not necessarily have wanted to take the extra step of merging every branch on the server with every branch on your computer. (For one thing, what if you had changes in a branch that weren’t ready to merge in? What if some branches had conflicts?)

 

Git’s solution is to keep track of the state of each branch in your remote repositories using a system of read-only, namespaced branches on your local copy of the repo. I lied a little bit when I said earlier that origin/* was just a notation for identifying remote branches. origin/master is also an actual branch saved in your local copy of the repository.

 

After fetching, you end up with copies of every single branch on the remote, even those that don’t have a local equivalent on your copy of the project, such as branches started by other people.

 

For safety and speed, Git tries only to use the network for moving commits around, and does any real work on your computer. So, instead of trying to compare data on your computer with data on the server, Git instead makes a copy of what’s on the server and lets you compare or merge against that.

 

The origin/master branch represents the origin remote’s master branch, pointing to whatever commit was at the head of that branch the last time you pulled it from the server.

 

Having these special offline copies of your remote branches can complicate matters rather than simplify them. For instance, we actually have three different branches called master: your local master, the remote’s master, and your local origin/master that’s supposed to—but isn’t guaranteed to be—in sync with the remote master.

 

Thankfully, branches like origin/master are read-only, and are designed to only ever represent a copy of what’s on the server. Once you run git fetch, you can generally assume that each offline branch is an accurate representation of its twin on the server, and go from there.

 

CHECKING OUT AN EXISTING BRANCH

 

git fetch to pull down copies

In most of the teams I’ve worked on, most branches have been owned by just one person, who was both the branch’s original creator and usually also the one responsible for merging it into master when the work was complete.

 

However, many projects are bigger than one person and take longer than a day to finish, and you may not be the first to be asked to work on a particular branch. You may even join a branch while someone else is still working on it, and many people may be contributing all at once. So how do you add a commit to someone else’s branch?

 

First, you need to check it out. To do that, we’ll use git fetch to pull down copies of all of the branches currently on the server:

[master] $: git fetch origin 
remote: Counting objects: 5, done. 
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 2), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done. 
From https://gitforhumans.info/our-website.git
9eb7cf6..fed3ac5 master -> origin/master
9eb7cf6..fed3ac5 new-homepage -> origin/   new-homepage

 

Having fetched the latest stuff from origin, all our server branches are now available to us on our computer, even offline. We’ll need to be online to push changes back up to the server, but we can do almost anything else until then.

 

For instance, we can ask Git to give us a list of every branch that existed on the server as of the last time we ran git fetch. Although by default the git branch command will only tell you what branches exist on your local copy, you can give it the --remote (-r) flag to ask it to instead show you all of the branches Git knows about from your remotes:

 

Any of these can be checked out and worked on, or merged into one of your branches. git pull origin master is, in fact, just a shortcut for a git fetch, followed by git merge origin/master.

 

Next, we’ll check out the branch we want to work on, helpfully called make-logo-bigger. We don’t need to include the origin/ prefix; if you’re checking out a remote branch for the first time, Git will first check to see if you have a local branch by that name, and if not will automatically set up a new local branch to track the remote one.

[master] $: git checkout make-logo-bigger

Branch make-logo-bigger set up to track remote branch make-logo-bigger from origin.

Switched to a new branch 'make-logo-bigger' [make-logo-bigger] $:

 

We’ve talked about where version control came from, and how to practice it on our own projects using Git. We now know how to make commits, create and merge branches, and synchronize our changes with other computers— and, by extension, with other people. Along the way, we’ve started to build up a history around our project.

 

GIT IS AN EXCELLENT TOOL for synchronizing changes across all our computers, and that’s how we almost always use it—to keep each other in sync with what we’re doing right now. But although most of the time all we care about is the current version, or a few current ones, Git does a great job of storing and tracking every version of our project, and those other thousands of commits are still there, ripe for exploration.

 

Every commit you add to your repository contributes to the historical record of your project, so it’s a good idea to make the best, most meaningful commits you can. In this final blog, we’ll look at some of Git’s tools for inspecting your project's history, and how useful this history can be.

 

READING THE LOG

GitHub’s commit search

The simplest way to inspect your project’s history is as an ordered list of commits. Git’s primary tool for viewing such a list is the git log command. Hosting services like GitHub offer web-based tools for browsing your old commits; they do the same job as git log with a little more user-friendly panache.

 

The advantage to learning git log is that, like the rest of Git’s command-line interface, it works the same way no matter what computer or hosting service you use. And, unlike GitHub’s commit search, it works offline.

 

By default, invoking git log will show a list of every commit in your project, from the current head commit all the way back to the beginning, in reverse chronological order, like so

$: git log

commit 45b1ec87cd2fde95a110dfe3028e93d25c9af186 Author: Thesis Demaree <Thesis@ThesisScientist.com> Date: Fri Dec 26 16:28:41 2014 -0500

Rename styles.css to main.css

commit bf8144d4690d3f6052dc7f42135e3e9944b96b5a Author: Thesis Demaree <Thesis@ThesisScientist.com> Date: Thu Dec 25 13:24:25 2014 -0600

 

Initial commit

The lines starting with commit denote, well, a commit, each of which takes up a few lines. The long string of letters and numbers are each commit’s ID. Below that, we see the Author who made this commit (me), and the Date on which it was added. Finally, there’s the commit’s log message, shown indented underneath the metadata.

 

This is the history we’ve been crafting as we make changes and commits on the project. Logs like these are why commit messages exist, and why it’s good for them to be short. Ideally, you should get a sense of how this project has evolved over time just from paging through git log’s output and scanning the log messages.

 

The previous example shows the log’s default output. But Git can tell you as much or as little about your commits as you want, in virtually any format. The --pretty option allows you to select from a number of predefined formats, or specify your own using a format string. Here’s the built-in oneline format, which shows only the commit ID and log message on a single line:

$: git log --pretty=oneline

45b1ec87cd2fde95a110dfe3028e93d25c9af186 Rename »styles.css to

main.css

bf8144d4690d3f6052dc7f42135e3e9944b96b5a Initial »commit

 

A complete list of available log formatters, and the syntax for defining your own as a format string, can be found in the Git documentation. And Atlassian has published a thorough yet friendly tutorial showing all the options for formatting log output, including some brief explanations of why you would use certain formats.

 

Specifying your starting point

 git log

As we learned earlier in the blog, Git’s concept of history is based on lineage: a commit contains a reference to one or more parent commits, which point to their parent commits, which point to their parent commits, all the way back to the beginning. git log appears to show a history of our project in reverse chronological order, but the chronology is kind of a side effect.

 

What it’s really doing is following the chain of parent commits to show you where your current commit comes from.

 

By default, the list it shows starts from the head commit on your current branch. If you have master checked out, it’ll show you the complete ancestry of the top commit on master. You can, however, ask it to use any commit or branch as a starting point. For example, here we’re asking it to show the history on our new-homepage branch:

$: git log new-homepage

 

Viewing a range of commits

Viewing a range of commits

You can even specify commit ranges; that is, you can ask for all log entries between two commit references so that you can see only what has changed between any two points in your project history. This is best for seeing a list of what differs in a topic branch since we branched off. Here, we’ll ask git log to show all the commits that have been added to new-homepage that aren’t yet merged into master, using the --oneline option to make the log output easier to scan:

 

$: git log --oneline master..new-homepage bce44eb Bigger navigation buttons

056c8fd Update hero area w/ new background image 7e53652 Make font loading async

 

Our range is listed here as start..end, or rather, olderbranch..newerbranch, or to be really pedantic, branch..branchwithdifferentcommits. You see, git log doesn’t care about chronology and, as we know, there’s nothing stopping master from having its own changes that aren’t yet merged into a topic branch like new-homepage.

 

The simplest way to explain what git log branch-a..branch-b does is that it shows you a list of all the commits in branch-b that aren’t in branch-a. In the previous example, we see three commits from new-homepage that aren’t yet merged into master.

What’s really cool is that we can ask git log to show us a list the other way around—to give us a list of commits in master that aren’t in new-homepage:

 

$: git log --oneline new-homepage..master 5514d53 Fix JavaScript bug on products page 4af326c Support for Microsoft Edge

This works with remote branches too, so you can find out if your local copy of a branch is trailing behind the server’s copy. Here I’m asking git log to show me a list of commits on the server that I haven’t pulled into my local branch yet, with a custom format string so I can see who made each commit:

$: git log --pretty=’format:%h - %an: %s’ »new-homepage..origin/new-homepage

635ce39 – Susi Oliver: Important legalese change

65ae00e – Thesis Scientist: Make many (JS) promises

 

If either side of the commit range is your current HEAD commit—that is, the commit that’s currently checked out into your working copy—you can leave it blank. Here we’ve got new-homepage checked out, and we’re asking to see a list of new commits from master:

[new-homepage] $: git log --oneline ..master 5514d53 Fix JavaScript bug on products page 4af326c Support for Microsoft Edge

 

This is exactly the same result as when we asked for a log on new-homepage..master earlier. Because new-homepage is checked out, Git infers that’s the other side of the comparison we’re asking for, saving us a little typing.

 

Filtering the log

Filtering the log

Finally, as if that weren’t enough, you can pass filtering options to git log to limit the list of commits, to show only a certain number of recent commits, or only those from within a certain date range, or only those added by a certain member of the team. For example, this command will show only commits in one of my repos that were added by me, that include the word “Heroku,” that are more than three years old, and that changed the file called

 

Gemfile:

$: git log --author=Demaree --grep=heroku »--oneline Gemfile 94d8ecb Gemfile tweaks to remove heroku ccc5266 Merged heroku prep into master

 

THE LONG AND SHORT OF COMMIT IDS

The unique ID of a given commit is among the most important things you might use git log to look up. Git’s commit IDs serve a few purposes, but the most important one is the most straightforward: we use them to identify a commit, as in, “that change that messed up all the image tags happened in 65ae00e.” So far, we’ve mostly seen commit IDs in a short form like that. Occasionally, though, you’re likely to see commit IDs in their longer, unabridged form, like this: 65ae00edfe8a795199ed416a9d6df8c3cfe8bd0a

 

What’s the difference? And why does Git use these weird-looking strings of letters and numbers to identify revisions, instead of just a number?

 

As covered in the last blog, even though many of us use Git in a centralized way, Git is designed to be decentralized. Every one of our computers has its own copy of the repository, which can evolve independently from the others.

 

You and I can each make changes and commit them to a branch while offline, and neither of us needs to know what the other is doing until later, when we sync our local copies with a remote. As we make those commits, Git needs to be able to assign an identifying name or number to each one, but Git can’t know ahead of time whether some other computer has already used that name or number.

 

What’s more, Git’s design values stability and data integrity above all else. In a 2007 presentation, Linus Torvalds talked about the need for version control systems to look after the veracity of the data under their care, and talked up Git’s features for ensuring correct data:

 

If you have disk corruption, if you have any kind of problems at all, Git will notice them ... I guarantee you, if you put your data in Git, you can trust the fact that five years later, after it was converted from your hard disk to DVD to whatever new technology and you copied it along, five years later you can verify that the data you get back out is the exact same data you put in.

 

Git solves both problems by creating and using IDs based on the contents of each commit, rather than arbitrarily assigning each one a name or number. Technically, commit IDs aren’t identifiers so much as checksums, a kind of digital fingerprint, typically used to validate data that has been transmitted over a network.

 

You’ll often see a list of checksums alongside software builds, so people downloading, say, a prerelease build of Windows can verify that the downloaded file is complete, and hasn’t been tampered with.

 

When you make a commit, Git takes everything that constitutes the body of the commit—your name and email address, the current date and time, the commit message, references to any parent commits and the current project snapshot—and runs them through the hashing function to generate that 40-character string.

 

The result is a value that’s virtually guaranteed to uniquely identify a given commit. That’s true even if the same commit is made on two different computers.

 

Two identical commits will have identical hashes, and therefore identical IDs, regardless of which computer added them to the repo. Conversely, commits that differ in any way—even just by having a different author—are guaranteed to have different IDs; therefore, each hash is guaranteed to uniquely identify a single commit.

 

While these long hashes help smooth collaboration, by making it easier to swap commits between computers, they also create a new problem for us. Because they are so long, reading and writing them can be unwieldy.

 

Fortunately, even if you provide only a fragment of the full commit ID, Git is smart enough to figure out what commit you want, as long as the short ID is at least four characters long, and unique within your repo.

 

For instance, the commit ID I showed at the start of this section could be shortened to as few as four characters (65ae) without overlapping with any other commits in that project. In fact, in most Git repositories, a seven-character ID like 65ae00e is sufficient to uniquely identify any commit, even in repositories with tens of thousands of commits. For that reason, Git will frequently use short IDs in its responses to you rather than the longer form.

 

In the rare scenarios when two short IDs overlap, Git is also smart enough to handle things gracefully by automatically adding digits to the short IDs it prints out. In the Linux kernel project, for instance—perhaps the oldest Git repository and certainly one of the biggest— it turns out that seven characters are not enough to avoid overlapping IDs, but eleven digits do work, so Git automatically switches its short ID format to use the fewest digits that will still be unique across the whole project.

 

COMMIT MESSAGES

GitHub offer commit

Git and tools like GitHub offer many ways to view what actually changed in a commit. But a well-crafted commit message can save you from having to use those tools by neatly (and succinctly) summarizing what changed.

The log message is arguably the most important part of a commit, because it’s the only place that captures not only what was changed, but why.

 

What goes into a good message? First, it needs to be short, and not just because brevity is the soul of wit. Most of the time, you’ll be viewing commit messages in the context of Git’s commit log, where there’s often not a lot of space to display text.

 

Think of the commit log as a newsfeed for your project, in which the log message is the headline for each commit. Have you ever skimmed the headlines in a newspaper (or, for a more current example, BuzzFeed) and come away thinking you’d gotten a summary of what was happening in the world? A good headline doesn’t have to tell the whole story, but it should tell you enough to know what the story is about before you read it.

 

If you’re working by yourself, or closely with one or two collaborators, the log may seem interesting just for historical purposes, because you would have been there for most of the commits. But in Git repositories with a lot of collaborators, the commit log can be more valuable as a way of knowing what happened when you weren’t looking.

 

Commit messages can, strictly speaking, span multiple lines, and can be as long or as detailed as you want. Git doesn’t place any hard limit on what goes into a commit message, and in fact, if a given commit does call for additional context, you can add additional paragraphs to a message, like so:

 

Updated Ruby on Rails version because security

Bumped Rails version to 3.2.11 to fix JSON »security bug.

Note that although this message contains a lot more context than just one line, the first line is important because only the first line will be shown in the log:

commit f0c8f185e677026f0832a9c13ab72322773ad9cf Author:Thesis Scientist <thesis@thesis.com> Date: Sat Jan 3 15:49:03 2013 -0500

Updated Ruby on Rails version because security Like a good headline, the first line here summarizes the reason for the commit; the rest of the message goes into more detail.

 

Writing commit messages in your favorite text editor

 

Writing commit messages in text editor

Although the examples in this blog all have you type your message inline, using the --message or -m argument to git commit, you may be more comfortable writing in your preferred text editor. Git integrates nicely with many popular editors, both on the command line (e.g., Vim, Emacs) or more modern, graphical apps like Atom, Sublime Text, or TextMate.

 

With an editor configured, you can omit the --message flag and Git will hand off a draft commit message to that other program for authoring. When you’re done, you can usually just close the window and Git will automatically pick up the message you entered. To take advantage of this sweet integration, first you’ll need to configure Git to use your editor (specifically, your editor’s command-line program, if it has one). Here, I’m telling Git to hand off commit messages to Atom:

 

$: git config --global core.editor "atom --wait"

Every text editor has a slightly different set of arguments or options to pass in to integrate nicely with Git. (As you can see here, we had to pass the --wait option to Atom to get it to work.)

 

Elements of commit message style

Elements of commit message style

There are few hard rules for crafting effective commit messages—just lots of guidelines and good practices, which, if you were to try to follow all of them all of the time, would quickly tie your mind in knots.

To ease the way, here are a few guidelines I’d recommend always following.

 

Be useful

The purpose of a commit message is to summarize a change. But the purpose of summarizing a change is to help you and your team understand what is going on in your project. The information you put into a message, therefore, should be valuable and useful to the people who will read it.

 

As fun as it is to use the commit message space for cursing—at a bug, or Git, or your own clumsiness—avoid editorializing. Avoid the temptation to write a commit message like “Aaaaahhh stupid bugs.” Instead, take a deep breath, grab a coffee or some herbal tea or do whatever you need to do to clear your head. Then write a message that describes what changed in the commit, as clearly and succinctly as you can.

In addition to a short, clear description, when a commit is relevant to some piece of information in another system—for instance, if it fixes a bug logged in your bug tracker—it’s also common to include the issue or bug number, like so:

 

Replace jQuery onReady listener with plain JS; »fixes #1357

Some bug trackers (including the one built into every GitHub project) can even be hooked into Git so that commit messages like this one will automatically mark the bug numbered 1357 as done as soon as the commit with this message is merged into master.

 

Be detailed (enough)

commit message

As a recovering software engineer, I understand the temptation to fill the commit message—and emails, and status reports, and stand-up meetings— with nerdy details. I love nerdy details. However, while some details are important for understanding a change, there’s almost always a more general reason for a change that can be explained more succinctly.

 

Besides, there’s often not enough room to list every single detail about a change and still yield a commit log that’s easy to scan in a Terminal window. Finding simpler ways to describe something doesn’t just make the changes you’ve made more comprehensible to your teammates; it’s also a great way to save space.

 

A good rule of thumb is to keep the “subject” portion of your commit messages to one line, or about 70 characters. If there are important details worth including in the message, but that don’t need to be in the subject line, remember you can still include them as a separate paragraph.

 

Be consistent

However you and your colleagues decide to write commit messages, your commit log will be more valuable if you all try to follow a similar set of rules. Commit messages are too short to require an elaborate style guide, but having a conversation to establish some conventions, or making a short wiki page with some examples of particularly good (or bad) commit messages, will help things run more smoothly.

 

Use the active voice

The commit log isn’t a list of static things; it’s a list of changes. It’s a list of actions you (or someone) have taken that have resulted in versions of your work. Although it may be tempting to use a commit message to label a version of the work—“Version 1.0,” “Jan 24th deliverable”—there are other, better ways of doing that. Besides, it’s all too easy to end up in an embarrassing situation like this:

# Making the last homepage update before releasing the new site $: git commit -m "Version 1.0"

# Ten minutes later, after discovering a typo in your CSS

$: git commit -m "Version 1.0 (really)"

# Forty minutes later, after discovering another typo $: git commit -m "Version 1.0 (oh FFS)"

 

Describing changes is not only the most correct format for a commit message, but it’s also one of the easiest rules to stick to. Rather than concern yourself with abstract questions like whether a given commit is the release version of a thing, you can focus on a much simpler story: I just did a thing, and this is the thing I just did.

 

Those “Version 1.0” commits, therefore, could be described much more simply and accurately:

$: git commit -m "Update homepage for launch"

$: git commit -m "Fix typo in screen.scss"

$: git commit -m "Fix misspelled name on about page"

 

I also recommend picking a tense and sticking with it, for consistency’s sake. I tend to use the imperative present tense to describe commits: Fix misspelled name on About page rather than fixed or fixing. There’s nothing wrong with fixed or fixing, except that they’re slightly longer. If another style works better for you or your team, go for it—just try to go for it consistently.

 

What happens if your commit message style isn’t consistent? Your Git repo will collapse into itself and all of your work will be ruined. Kidding! People are fallible, lapses will happen, and a little bit of nonsense in your logs is inevitable. Note, though, that following style rules like these gets easier the more practice you get. Aim to write the best commit messages you can, and your logs will be better and more valuable for it.

 

MAKING GOOD COMMITS

MAKING GOOD COMMITS

For us humans, the job of a commit is to bundle changes into logical chunks. Sometimes, the logic behind a particular set of changes is as simple as: “This is when I, the developer, felt it made sense to save my progress”. But sometimes there’s more of a story—more meaning—behind a change.

 

For a software tool so concerned with keeping your data clean and consistent, Git is remarkably flexible about exactly what you commit and when. One really cool (and potentially confusing) thing about Git is that it doesn’t require you to stage or commit everything you’ve changed all at once.

 

Git lets you move some changed files—or even changed parts of files—down the path from working copy to committed, while leaving other stuff unstaged or uncommitted. If you make three sort-of-unrelated changes to a single stylesheet file, you can commit each of the changes separately, or together, as you see fit.

 

Let’s say you’re working on a project for which you’ve changed both a JavaScript file and your project README, for unrelated reasons. Here’s our status:

[master] $: git status
# On branch master
#
# Initial commit
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# README.md
# site.js

 

The simplest thing would be to commit both files at the same time, with a joint log message like “Add onReady event listener; update README.” But if committing the two changes separately is more meaningful, and provides more context for your logs, Git makes it relatively easy to do that.

 

First, let’s stage and commit one of our two changes:

$ git add site.js
$ git commit -m "Add onReady event listener" [master 591672e] Add onReady event listener
1 file changed, 3 insertions(+)
After we do that, our README is still modified and unstaged, ready for us to commit separately:
$ git add README.md
$ git commit -m "Update README" [master 96406dd] Update README 1 file changed, 1 insertion(+)

Now, if we check the log, we’ll be at least a little more confident that each entry in it—each commit—represents a single, complete idea.

This is a hard thing to do perfectly all the time and, like a lot of other best practices, commits that are perfect, single units of work, wrapped up in a perfectly worded commit message, are the exception rather than the rule.

Don’t beat yourself up for a big, messy commit with a vague label like “fixed the header”—just know that better is possible and aim for it when you can.

 

COMPARING COMMITS

writing commit messages

We’ve talked a lot about versions, states, and how changes add up incrementally over time. When we deal with our work one commit at a time, we’re encouraged to think beyond the state of our work right now and consider the state it was in yesterday, and the state it will be in tomorrow.

 

From writing commit messages and deciding what should go into a commit, we’re prompted to think about how we describe the actions we make as we make them, which eventually lends itself to a more thoughtful, considered approach to work.

 

Most of all, Git asks us to treat changes to our projects—more formally, the transitions between states represented by commits—as actual events that occurred. Each commit represents not only a snapshot of our whole project, but (except for the first one, of course) also a change from a previous commit.

 

Eventually, once you start thinking and working in versions, you will want or need to compare the versions to see, specifically, what has changed. A commit message can give you a summary, but Git also offers a handy way to actually inspect the differences between two commits.

 

git diff (short for “difference”) shows the changes between two versions of your project, or two versions of a given file or files. In this way it’s a lot like git log, and in fact you can choose to see diff information in your log output if you want. In addition to comparing committed versions, if you’ve made uncommitted changes, you can use git diff to show you everything that’s different between your working tree and the last commit.

 

Here, git diff shows us a simple change to the README file we were looking at before:

$: git diff
diff --git a/README.md b/README.md
index 0c0a11f..48fb805 100644
--- a/README.md
+++ b/README.md @@ -1 +1,3 @@
-# My Project
\ No newline at end of file +# My Project
+

 

+This is a project managed by Git. \ No newline at end of file

Admittedly, this is not the easiest thing to read. git diff’s default output is generated using a Unix comparison tool (itself called “diff”) originally developed in the early 1970s, and displayed using a paging program called “less”, whose job it is to display texts longer than your Terminal window. (I should point out that this is a rather simple example. Most of the time, your diffs will be longer and more complex.)

Here’s what’s going on in this diff: the lines starting with dashes (-) are ones that we’ve deleted since the last commit; the lines starting with plus signs (+) have been added.

 

Diffs (like Git generally) focus on changed lines in your files, and changing even one character in a line will cause Git to consider the line changed. Also, just as renaming a file is seen by Git as a combination of deleting the old file and adding a new one, changing a line is seen as both a deletion and an addition.

 

You can see that in this diff: the main headline is present in both versions, as a deletion and an addition. What changed? In the committed version, the headline wasn’t followed by a line break. (Yes, even adding a line break is enough for the line to be marked as changed.)

 

Diffs can be incredibly useful, but unless you’re very comfortable with the Unix diff format, they’re also one of the few things in Git for which I wholeheartedly endorse using a GUI tool, either as an app on your computer or as part of a hosting service like GitHub.

 

For projects hosted on GitHub or GitHub Enterprise, every repo has a compare view (accessible by appending /compare to your project’s URL) that does a great job of summarizing changes in an easy-to-use visual format.

 

Mac users might also consider Black Pixel’s app Kaleidoscope

. Kaleidoscope is a general-purpose file comparison tool that can be used to compare any two files, regardless of whether they’re managed by Git. That said, it offers great integration with Git, including an easy-to-use setup tool that configures Git to open Kaleidoscope for diffs via the git difftool command.

 

Git does offer a simpler diff format that is quite easy for humans to read, if we return to the command line: the “diff stat,” which reduces a whole diff to a list of the files that differ between two versions, marked up to indicate how they’ve changed.

 

Here, I ask git diff to show stats for the difference between the current HEAD commit on Typekit’s Web Font Loader repo and the one before it (HEAD~1):

$ git diff --stat HEAD~1 
CHANGELOG | 3 +++
lib/webfontloader.rb  | 2 +-
webfontloader.gemspec | 2 +-
webfontloader.js | 4 ++--
4 files changed, 7 insertions(+), 4 deletions(-)

 

While more concise than the full diff, the diff stats offer a good summary of the changes in this commit. Each line shows a file that was changed in this commit, like lib/webfontloader.rb. Next to it, separated by a pipe (|) character, are the stats for that file: two changes (one addition and one deletion).

 

Knowing Git as well as we do now, observing that this is just one commit’s worth of changes, we can infer that it may have been a one-line edit, such as a change in version number.

 

From here, if we need more information, we can request a full diff of a particular file (using git diff HEAD~1 webfontloader.js), or a set of files (by passing in multiple file names), or the whole project. We can also ask for stats covering commits made over a much broader span of time:

$: git diff –stat HEAD~15 
.travis.yml | 5 +-
CHANGELOG | 12 ++
README.md | 24 +--
lib/webfontloader.rb | 2 +-
package.json | 5 +-
spec/core/fontwatcher_spec.js | 3 -
spec/core/fontwatchrunner_spec.js | 441 +++++++++--------------
class='lazy' data-src/core/domhelper.js | 26 ++-
class='lazy' data-src/core/fontruler.js | 2 +-
class='lazy' data-src/core/fontwatcher.js | 22 +-
class='lazy' data-src/core/fontwatchrunner.js | 33 +--
webfontloader.gemspec | 6 +-
webfontloader.js | 42 ++--
13 files changed, 207 insertions(+), 416 deletions(-)

 

While this seems to roll up fifty commits’ worth of changes into a single summary, this is a good place to clarify that git diff (including the stats view) only compares two commits at a time. HEAD~50 doesn’t represent the last fifty commits, just the one commit that’s fifty steps back in your chain of ancestry.

 

But let’s also remember that every commit is a full snapshot of your project, and that every commit builds on the one before it. Logically, seeing the differences between your current commit and its fiftieth parent should be roughly the same—certainly the same in spirit—as seeing a summary of your last fifty changes, because those changes should all still be around in your current commit.

 

If you find the stats valuable, you can even include them in your git log output using the --stat option. Here, I’m asking git log to show me a log that includes stats, plus a custom format for the log entries, limited to the changes since the commit before last.

$ git log --stat --pretty=format:"%h (%an) %s" »HEAD~1.. d08a7f2 Release 1.5.10

CHANGELOG | 3 +++
lib/webfontloader.rb | 2 +-
webfontloader.gemspec | 2 +-
webfontloader.js | 4 ++--

4 files changed, 7 insertions(+), 4 deletions(-)

 

TAGGING COMMITS

GIT current commit

In addition to all the other kinds of references we’ve seen—long and short commit IDs, branch names, and the HEAD pointer—commits can be given permanent, human-friendly names, called tags. Tags are a lot like branches in that they assign human-readable names to a particular commit.

But unlike branches, whose names, though consistent, float as the HEAD commit on each branch change, tags always reference a specific commit, to mark moments in history that are interesting or significant.

 

Depending on your project, you may never use tags, or you may use them a lot. Unlike branches, which are central to almost every Git workflow I’ve seen, tags have no intrinsic meaning or intended use, so many projects never use them. For web sites and applications, tags’ value may depend entirely on how you release code to your production servers.

 

Many teams deploy by just updating the servers with the latest stuff from master; they control what code goes out to the public by laying down rules about when and how commits can be merged in, and quality checks to ensure everything in master is always production-ready.

 

For most of us, branches are not just simpler but more meaningful—the branch name master doesn't just reference a commit, it references the latest commit on a certain line of work. Branch names change less often, and so involve less work.

 

Git tags are commonly used for software libraries or frameworks that are shipped in numbered versions. For instance, the code for version 4.2.0 of the Ruby on Rails framework matches up with the rel-4.2.0 tag on their Git repo, which in turn points to commit 7847a19, whose message, helpfully, is “Preparing for 4.2.0 release.”

 

The official 4.2.0 release is in the form of a Ruby package hosted on rubygems.org; the tag serves to connect that package with the commit used to produce it.

 

To tag a commit, you’ll use the aptly named git tag command. It always takes as its first parameter the tag name, which can be any string. Here, we’ll tag the current commit on our current branch with the name fhqwhgads. (If this seems like a bizarre example, you should know I once worked on a team that tagged our biweekly website releases after our favorite stores, e.g., prada.0.)

 

$: git tag fhqwhgads

Having tagged the commit, we can now use the name fhqwhgads anywhere Git takes a commit ID. If the commit we want isn't checked out right now, we can pass in a commit

ID to tag:

 

$: git tag fhqwhgads 8891c37

Because nothing in Git can ever be simple, it turns out there are two kinds of tags. The kind we just created is a lightweight tag; it’s stored in the repository as just a name pointing to a commit, similar to a branch.

The other kind is an annotated tag, which, in addition to a name and commit reference, can also include a message, similar to a commit message.

 

$: git tag fhqwhgads –a –m "Fhqwhgads release (22 »Dec 2014)"

Tags, like, branches, can and should be shared on a remote, and you can push them to your remote the same way, using git push:

 

$: git push fhqwhgads

There aren't many rules surrounding tags, but the few rules that do exist are strict, as we’ll see next.

 

Tag names must be unique

Just as it would be a huge problem if two different versions of your project could have the same name, Git does not allow you to create a tag if another tag by the same name already exists, and will reject a pushed tag if it already exists on the server.

 

Git will, however, let you give a tag the same name as a branch, or vice versa. But if you try to do anything ambiguous with a tag or branch name, Git will give precedence to the branch and will warn you that that may not have been the right move. Here’s what happens when, in a repo that has both a tag and branch named branch-2, I try to check out branch-2:

$ git checkout branch-2

warning: refname 'branch-2' is ambiguous.

Switched to branch 'branch-2'

 

To make your life easier, avoid giving branches and tags the same names. A lot of teams who use tags will prepend something to their tag names to disambiguate them from branches; our fhqwhgads tag might instead be called rel-fhqwhgads to distinguish it from any fhqwhgads branches that may be flying around. This has the added benefit of saying what the tag refers to; in this case, rel is short for “release.”

 

Tags are meant to be permanent

Git tags

Git will let you change things like tags. More precisely, it will allow you to delete a tag and replace it with a new one under the same name. (To wit: if you do tag the wrong commit by accident, which sometimes happens, you can use git tag –d <tagname> to delete the bad tag and then create a new one pointing at the right commit.)

 

Having said that, a tag’s purpose is to serve as a stable nickname for a specific commit—a job made more difficult if the names or commits underneath tags can change. Once you’ve pushed a tag to a remote—especially a remote you’ve shared with other people, like a collaborative hub—try never to change it.

 

There may be times when you need to, or when re-creating a tag is simpler than creating a new tag with a new name, but I’ve found these situations to be exceptional, and not worth the headache of having to message your entire team to explain that rel-wombat.0 may or may not really be the commit it’s supposed to be.

 

TIME TRAVELING WITH git checkout

Reviewing what we’ve done is nice, but Git allows you to truly revisit the past by checking out old commits, using the same git checkout command you use to switch branches. I don’t just mean “checking out” in the colloquial sense—“Hey, check out this cute panda video”—but in the version-control sense: when you check out a commit (or, for that matter, a branch), you’re not just seeing a previous version of your work; you’re resetting your local copy of the project to match whatever version you asked for. “Checking out” is used here in the same sense as a library blog.

 

And if it’s unclear in this metaphor where your working tree fits in, remember that even if you’re working progressively—adding new commits to a branch, rather than revisiting old ones—you still always have a version of the project checked out: the branch you’re working on, to which you can add more commits.

 

Checking out a commit by itself differs from checking out a branch only in that you’re not really expected to add any new commits after you check it out. That’s not to say you can’t add commits, though. To explain this distinction, let me give you an example.

 

Let’s say you start getting reports from your users that something you know was working in a certain browser or device when you first deployed your project a few weeks ago is now no longer working. Let’s also say that when you made that first production push, you also tagged the commit you pushed as rel-v1.0.

 

The first thing to do is confirm that the code you deployed originally actually did work, by checking out the old version and opening it up in a browser. Here we’ll assume it’s a static website that you can open directly in a browser, but if your site has a build step—using Grunt, Middleman, or some other tool —it should work here, too. Just run your build or server task after checking out the old site.

 

To do this, run git checkout with the tag or commit ID you want to return to:

$: git checkout rel-v1.0

Note: checking out 'rel-v1.0'.

 

This command did what we wanted it to do: Git has reset the files and folders in our working tree to match the version of our project we’re trying to return to, which was commit 591672e, also known by its tag, rel-v1.0. We can now open up the website and confirm that, yes, it worked when we shipped it.

From here, we might continue our investigation by looking at the log, reviewing the commits that have been added to master since this one (git log rel-v.10..master), or look at the actual changes between this version and the latest one (git diff rel-v.1.0..master).

 

If a particular commit seems likely to have introduced the bug, you can check it out to confirm (or allay) your suspicions. Git even offers a tool (git bisect) that performs this kind of binary search and automatically finds the commit that caused a particular issue.

 

What is the detached HEAD state?

HEAD pointer

When you check out a commit, as opposed to a branch, Git puts you into the “detached HEAD” state: your computer’s HEAD pointer is pointed at a particular commit, but not at a branch. You’re “detached” in the sense that you’re not working on any branch. In practice, this means you can make new commits, and they will be saved, but you won’t have a branch name to refer back to them.

 

You’re not really supposed to add commits while detached. Most of the time, Git expects you to check out an old commit to review or test the old code, not to make changes. (That’s what branches are for.)

 

But this can be a feature: commits made in the detached state can be used as a scratch pad. While detached, you’re free to make experimental changes and commit them, and discard any commits you make in this state without impacting any branches by performing another checkout.

 

There’s even less risk than usual that a bad change will find its way into everyone else’s copy of the project, or out to production, because (unless you move them into a branch with git branch or git checkout -b) commits in the detached state are homeless, unless you decide to create a branch to contain them.

 

If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. For example:

 

$: git checkout -b new_branch_name HEAD is now at 591672e... Release v1.0

Returning from the detached HEAD state (reattaching the HEAD, so to speak) is as simple as checking out a branch—in this case, returning to master:

 

$: git checkout master

Previous HEAD position was 591672e... Release v1.0 Switched to branch 'master' git checkout handles any reference that isn’t a branch name as if it’s just a commit, even if you’re asking for it using a ref that includes the branch name, which can inadvertently lead you into the detached state.

 

if you use git checkout make-logo-bigger to check out the branch named make-logo-bigger, then you’ve checked out a branch. However, if you ask to check out origin/make-logo-bigger (a remote branch reference), you’ve checked out the commit that’s currently at the head of that branch, but not the branch.

 

WRAPPING UP

 

Git repository

One phrase that I’ve barely used in this blog, outside of a few examples (where I’ve included it as an in-joke), is directed acyclic graph (DAG). A directed acyclic graph is a kind of data structure in which individual nodes point to other nodes, the references building on one another to form chains of information, spreading out like the roots of a tree, growing endlessly as we work, adding to the graph with every commit.

 

These kinds of graphs are often used to visualize Git branches, and it’s not uncommon to see even the most basic Git tutorial include a bunch of branching diagrams.

To be fair, DAGs are a somewhat advanced concept, and most Git tutorials don’t go so far down the rabbit hole as to mention them by name, even if they employ them as visual aids.

 

I mention DAGs here, as we wrap up our time together, to make a point about the philosophy of this blog. When trying to explain Git, it’s common to focus on the big picture: whole networks of repositories pushing and pulling one another, whole systems of branches flowing into and out of one another. I’m not disparaging such attempts: these things are real, and they’re spectacular.

 

But I also find that looking at these things as systems misses Git’s most wonderful quality: people like you (and me, and our teammates) each making changes, evolving our projects one step at a time, crafting histories.

 

The graph just isn’t that important if what you’re trying to do is save the next version of your project or share changes with your team. And although the history we collaborate on via Git can be modeled as a graph, it can also be a rendered as a list of incidents—as a story.

 

Admittedly, a Git repository is an odd place to tell a story; Git’s command-line interface, not the most natural way to tell one. Back at the very beginning of this blog, I described Git’s interface as a “leaky abstraction.” Git tries, but doesn’t always succeed, to protect us from having to understand the many complex things going on when we run a particular command. In not succeeding, Git encourages us to learn about what’s actually going on behind the scenes.

 

But the stories we tell together are just as real and beautiful as the information structures the creators of Git have created to contain them. And now, I hope, you’ll be armed with the knowledge to tell these stories with a minimum of fear.

 

CONCLUSION

IN THIS blog, we’ve covered everything from the difficulty of revising writing carved in stone to tips for how best to take advantage of a detached HEAD. Along the way, we’ve learned a few things about Git: what commits are made of, how each commit is a whole version of your work, and how commits, along with remotes, branches, and other stuff, come together to create a wild new landscape of things that—good news!—you now need to worry about in your daily work.

 

That’s okay! We’ve only scratched—by design—the surface of what Git is capable of. It’s less important for you to come away from this blog knowing every single Git command than it is for you to know how Git thinks and, from there, to understand that Git is neither evil, nor magical, nor scary. It’s just a tool and, if you use it properly, it will always serve you well.

 

More than that, though, you can use the commands and functions we’ve covered in this blog as building blocks for finding your own satisfying Git workflows, and as jumping-off points for learning new tricks. Depending on the kind of work you do, you’ll either find that the knowledge imparted by this blog is more than enough to help you get the job done, or you’ll feel equipped to ask more incisive questions about how Git can better serve you in the future.

 

RESOURCES

git init

 

Command Reference

Here’s a quick list of every Git command referenced in this blog, plus a few others. Arguments in square brackets (e.g., [thing]) are optional.

git config [--global] <key> <value>

Updates Git’s settings, modifying the preference identified by <key>, such as user.email, with the given <value>, such as Thesis@ThesisScientist.com. The --global flag saves preferences to a file in your home directory, so Git will apply them to every project on your computer. Otherwise they’re saved and applied only within a specific project.

 

git init

Creates a new Git project inside the current working directory—that is, if you’re inside a directory named my-awesome-project that contains a website you’re working on, running git init will turn the folder into a fresh Git repository, ready to use.

 

git clone <url> [directory]

Copies an existing Git project located at the given url to your computer as a new directory. By default, the directory will be named after the Git repository in the URL—the repo https://gitforhumans.info/rails.git would be copied into a folder named rails, but you can provide your own directory name as an argument if you want.

 

git status [-s] [path/to/thing]

Outputs the status of your working copy: identifies which files are modified but not staged, or added but not committed. The optional --short or -s flag gives you a shorthand version of the status readout. By default, git status will show you the status of everything in your project, but you can give it a directory or file path to limit the results.

 

git add [--all] filename.txt

Adds a changed file to the staging area for inclusion in the next commit.

 

git rm folder/filename.txt

A shortcut command that deletes the file at the given path, then stages the deletion for your next commit. If you’ve already deleted the file elsewhere (say, via the Finder), it just stages the change.

 

git mv oldpath.txt newpath.txt

Another shortcut that moves the file at oldpath.txt to newpath.txt, then stages that change.

 

git reset filename.txt

The opposite of git add: having staged a change to filename.txt, you can use git reset to un-stage it.

 

git commit [-a] [-m “Your message”]

Adds a commit with any changes you’ve staged using git add. The --all (or - a) option is a handy shortcut—it will automatically stage any changes you’ve made to your working copy. You can use the --message (-m) argument to specify your commit message; if left blank, Git will open up your default text editor (or whatever editor you’ve configured in Git’s settings).

 

git branch [-r|-a]

Shows a list of all your branches. By default, it shows you only branches on your local copy of the repo. The -r option can show you all the branches you’ve fetched from remotes; -a shows you both local and remote branches.

 

git branch <branchname> [<commit>]

If you give a branch name as an argument to git branch, it’ll create a branch with that name, starting at the current commit (or at any commit you specify, if you provide its ID).

 

git checkout [-b] <branchname-or-commit>

Updates your working copy to match the given branch or commit—in essence, switching you into that branch/commit. If you check out a branch, Git sets that as the current branch so you can add commits to it. If you check out a commit or tag, Git “detaches” from any branch—you can make commits, but they will only be retrievable by their commit IDs.

 

git merge <otherbranch>

Merges otherbranch into the current branch, provided there are no conflicts. If there are conflicts, Git copies over and stages as much of what’s in the other branch as possible, marking the conflicted files so you can resolve the problem yourself before committing.

 

git remote add <name> <url>

Adds a remote with the given name and URL to your local Git project settings.

 

git remote rm <name>

Removes the remote from your project settings along with any remote tracking branches you may have fetched from the server. Note that this only deletes the remote from your local settings—everyone else’s computers, and the server, are not affected.

 

git push <remotename> <branchname>

Pushes the current state of branchname to the remote named remotename.

 

git pull <remotename> <branchname>

Pulls down the current state of branchname from the remote to your local copy, and attempts to merge it into your current branch.

 

git fetch <remotename>

Copies everything from the remote to your local copy. When you run git pull, a fetch happens automatically.

 

git log [—oneline] [—pretty] [<branchname-or-commit>]

Shows a reverse-ordered list of commits, starting from the current head (or any one you specify by branch name or ID). You can use the --pretty option to customize the output; --oneline is a shortcut for the most used output format, consisting of a short commit ID and the commit message on each line.

 

git diff [—stat] [<branchname-or-commit>]

Generates a “diff”—a visual representation of the differences between two commits. The --stat option produces a summary view showing a list of files changed, with how many lines were added and deleted in each one.

 

git tag [-a] [-m] <tagname> [<commit>]

Tags a commit with the name you provide, which you can use as a static, friendly name for that commit. The -a flag tells Git to create an annotated tag, which includes information about when the tag was created, by whom, and a message saying what it’s about, just like a commit.

(Otherwise, Git creates a “lightweight” tag, which references a commit but doesn’t create any of that other info.) If you create an annotated tag, make sure to include the -- message/-m argument, again, just like a commit.

git tag -d <tagname>

You shouldn’t need to delete a tag, but if you do, you can do it by passing the -d (for “delete”) option to git tag.

git tag -l

Outputs a list of all the tags in your repository.

 

git push --tags <remotename>

As a safeguard against accidentally sharing a tag that you might not be ready to share, Git doesn’t push any of your tags unless you include the --tags option. For an exhaustive list of all Git’s commands and complete details on how to use them, check out the documentation on Git’s website.

 

Recommended Git apps

In this blog I’ve chosen to focus on Git’s command line interface in order to best demonstrate how Git thinks, and I still recommend that you start with the command line. However, there are many excellent time-saving Windows and Mac apps you can use once you’re up and running.

 

GitHub Desktop

Whether or not you host your code on GitHub, their desktop apps for Mac and Windows are among the very best— and they’re free. You can visually stage and commit changes, create and switch between branches, push and pull with remotes, and if you do host on GitHub, the desktop app makes it easy to create pull requests or open a compare view.

 

Tower

For Mac power users willing to spend $70, Tower offers many more options and features. Where GitHub Desktop focuses on the basics, Tower can also handle resolving merge conflicts, cherry-picking commits, and lots more.

 

SourceTree

More complex and powerful than GitHub’s apps, but lacking some of Tower’s slickest features, SourceTree (which is free) is a good choice for someone who wants a little more power in a Git app, but doesn’t want to spend money. Many popular coding tools also include built-in support for Git, or allow you to add it via plugins, so you can commit changes without leaving the app. Atom, Coda, Sublime Text, TextMate, BBEdit, Xcode, and Visual Studio Code all work with Git out of the box.

 

Git hosting services

GitHub

The biggest—and, at one time, kind of the only—name in Git hosting. Chances are, if you work with code you’ve had to do something on GitHub, because it’s what everyone uses. Ubiquity aside, GitHub remains arguably the best choice for most people: the company continues to invest in tools and resources that make it easier to collaborate via Git (such as Pull Requests), as well as new and interesting tools like GitHub Pages (web hosting powered by a Git repo).

 

GitHub charges money to host private projects for yourself or your organization. Public projects where anyone can pull or download your code, but only you and your teammates can push changes, are always free. There’s also an enterprise edition that costs lots of money, but you can run it on your own servers for maximum control over your data (Build software better, together).

 

Bitbucket

Not as slick as GitHub, Bitbucket has one nice benefit for hobbyists or small businesses: individuals and small teams can host unlimited private repos for free. Although Bitbucket lacks GitHub’s vast community, I personally use both: GitHub for public projects or collaborative work, Bitbucket for small personal projects.

 

Beanstalk

Specializing in paid, private repos, Beanstalk has a few nice features for web developers, most notably a built-in deployment tool that automatically updates your web servers after new code is pushed to your repository (A complete workflow). Finally, if you’re handy with the command line and either need to have total control over your data or enjoy a bit of extra nerdery, it's not that hard to roll your own hosting.

 

Because Git’s default protocol is SSH, any Linux server can conceivably be set up to host Git repositories. The folks at DigitalOcean have a handy guide to setting up a simple Git server on one of their virtual servers.