How to squeeze information from your repository with the command line

How to squeeze information from your repository with the command line

A common misconception is that the command line interface (CLI) of the Git client is not visual, and thus you need a graphical client to really see and understand what is in your repository. In reality, the Git standard interface offers plenty of options to access information from your repository. Let’s take a look at how to access repository information directly from the command line.

Basic tricks to improve your Git user experience

Before we start, let’s make sure we are not missing out on the basic user experience features of the Git client. For my examples, I’m using the repository for jasmine—an open-source library in JavaScript for unit testing.

Colors

First, if your git outputs look like

Image description

instead of

Image description

it means that your user interface is set to be monochromatic. You can change it with:

git config --global color.ui true

Aliases

As you will see, there are plenty of long commands in Git that you will use over and over again. You don’t need, nor want, to type them all the time. There are two ways to speed it up:

  • defining aliases for commands—something I show in an article about the git tree.
  • using to let the shell autocomplete your commands and arguments—I use Zsh, and its autocomplete is great. In my other article, you can read more about general CLI tricks.

For me, the best idea is to try out different combinations of parameters, and once you know what you like—make an alias for it.

Right here, right now

First, let’s focus on the current state of the repository.

Current state with git status

A top-level summary of where we are right now.

$ git status
On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean

As you can see, the default output is rather verbose. Let’s make small changes in the repo:

  1. An update to an existing file—we append “test” to README.md:
    $ echo "test" >> README.md
    
  2. New file with one line “test”:
    $ echo "test" > TEST.md
    

After these two operations, we have:

$ git status
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
    modified:   README.md

Untracked files:
  (use "git add <file>..." to include in what will be committed)
    TEST.md

no changes added to commit (use "git add" and/or "git commit -a"

As you can see above, the output tries to explain all the details about the current state of the repository. Once you understand it well, you’ll find a less verbose version more useful:

$ git status -sb
## main...origin/main
 M README.md
?? TEST.md

I have an alias git st pointing to this command, and I constantly use it.

If we were to commit those changes, the status would change accordingly—this time it would be showing that we have some commits on top of the version from the remote.

Searching the repository with git grep

Grep is a command line utility that lets you search through your files. Git provides a similar command that lets you search through files tracked by your repository. So, when I run git grep fdescribe, I get:

Image description

Note! As the output of this command is longer than my screen, Git shows it with a “pager”—by default, less. A quick navigation summary is as follows:

  • you scroll up and down with arrows
  • you exit with q

To learn more, you can read a section of my other article

git grep supports many of the same parameters as grep:

  • -l to show only filenames:
    $ git grep -l fdescribe
    lib/jasmine-core/jasmine.js
    release_notes/2.1.0.md
    spec/core/EnvSpec.js
    spec/core/SuiteBuilderSpec.js
    spec/core/integration/EnvSpec.js
    spec/core/integration/SpecRunningSpec.js
    src/core/Env.js
    src/core/Runner.js
    src/core/Suite.js
    src/core/SuiteBuilder.js
    src/core/requireInterface.js
    
  • -A <number> and -B <number> to show lines after or before the match—to provide you with more context. For example, for git grep -A 3 -B 3 fdescribe, I get:

Image description

Last commit with git show

To see the current commit, I can run git show:

Image description

If you want to see some other commit, you can just add its identifier (commit ID, branch name, etc.) to this command:

$ git show b3236316
commit b323631611673e1ccd2f300d70de4272db2da69e
Author: Steve Gravrock <sdg@panix.com>
Date:   Mon Jan 30 17:57:47 2023 -0800

    Pin Grunt to <1.6.0 for compatiblity with Node 12

diff --git a/package.json b/package.json
index 3bb03006..fced8840 100644
--- a/package.json
+++ b/package.json
@@ -37,7 +37,7 @@
     "eslint": "^7.32.0",
     "eslint-plugin-compat": "^4.0.0",
     "glob": "^7.2.0",
-    "grunt": "^1.0.4",
+    "grunt": ">=1.0.4 <1.6.0",
     "grunt-cli": "^1.3.2",
     "grunt-contrib-compress": "^2.0.0",
     "grunt-contrib-concat": "^2.0.0",

Changes with git diff

In the git status sections above, we made a few changes to the repository. With git diff, we can check exactly what changed:

Image description

Note that it’s only showing the changes to the file that is tracked by the repository. Our new file, TEST.md is ignored. When we add all the changes with git add ., the diff becomes empty: by default, it shows the difference between our working copy (files in the folder) and the staging (what is already added and ready to be committed). After adding everything, we can see all the changes that are staged with git diff --cached:

Image description

This output is the diff of what will go into a commit if we were to create one right now. I often use it to see if my changes are the way I want them to be as I commit.

All the files with git ls-files

Here’s a less common command, that is helpful when you create scripts:

$ git ls-files
.circleci/config.yml
.editorconfig
.gitattributes
.github/CONTRIBUTING.md
.github/ISSUE_TEMPLATE/bug_report.yml
.github/ISSUE_TEMPLATE/config.yml
.github/ISSUE_TEMPLATE/feature_proposal.yml
.github/ISSUE_TEMPLATE/support_request.yml
.github/PULL_REQUEST_TEMPLATE.md
.gitignore
CODE_OF_CONDUCT.md
Gruntfile.js
MIT.LICENSE
README.md
RELEASE.md
TEST.md
grunt/config/compress.js
grunt/config/concat.js
…

For example, when you pipe it together with grep, which I’ve covered before, you can achieve pretty neat results:

Image description

Look into the past

The key job of version control is to keep a history of your codebase. To make any use of the history that is kept, we need ways to access it. Let’s go through a few commands that allow that.

Browsing the history with git log

With git log, we can access a list of all the changes that lead to the current state of our repository. As you can see, the default output is verbose:

Image description

Fitting more on the screen with --oneline

For getting more commits on the screen, you can add the --oneline parameter to the command:

Image description

The complete message of the most recent commit is 3 lines long:

Pin eslint-plugin-compat to <4.1.0 to fix import error on CI

See <https://github.com/amilajack/eslint-plugin-compat/issues/528>.

When we ask for one line view, the same commit is described with only the first line of the message:

Pin eslint-plugin-compat to <4.1.0 to fix import error on CI

This is a reason for a popular convention with Git messages:

  • the first line: message title, summarizing the commit in less than 50 characters
  • an empty line
  • the rest of the message, wrapping at 80 characters

Most recent commits with -<number>

When the git log output is too long to fit on the screen, it opens the pager, so we can scroll through it. Usually, I know I’m interested in only the X of the last commits. We can limit the output by adding -10 (or any other number) to the log command:

Image description

Focus on a file or folder with -- <relative-path>

Sometimes, you are interested to see in what commits a given file or folder was changed. Many Git commands can be applied to a limited scope of files by specifying a relative path after --.

So, for example

$ git log -- README.md

shows

Image description

whereas

$ git log --oneline -- release_notes/

shows

Image description

Changes --since

Another neat feature of git log is that it can filter the changes based on the time description. So, when I add --since=”last month”, it shows only the two most recent commits:

Image description

Finding the author with git blame

Sometimes, you’ll see code so particular that you would like to know more about how it came to be. With git blame, you can find the following information for each line of the code:

  • the commit ID when it was changed,
  • the author of the change, and
  • the date when it was changed.

Example:

Image description

Note! git blame points to the most recent commit that changed each line, without evaluating how important the change was. If you do some styling updates on the codebase—indentation, adding or removing semicolons—this change will appear a lot in git blame output.

Searching the changes in the history

Sometimes you might be interested in finding history that adds or removes a specific string—I do this often to find the first instance of a function or pattern in the codebase I maintain. There are two commands that help with this goal:

  • git log -S "<string>"—normal string search
  • git log -G "<regex>"—regex search

In both cases, the results are shown in the same way as the log output.

String search for “jasmine-browser-runner”:

Image description

Regex search for commits that contain “fdescribe” or “fit”, done with git log -G "(fdescribe|fit)":

Image description

Context

So far, we’ve seen how to see the current state of our codebase and how to read its history—everything focused on the current state of our current branch. Git repositories can be changed in many branches in parallel, and we need tools to see our branch in the context of the whole repository.

To achieve this goal, I use the command git log --oneline --graph --decorate --all(git tree alias). The tree can be overwhelming when you see it for the first time, so let’s analyze it together:

Image description

  • HEAD—it’s the current commit we are at.
  • HEAD -> main—main is the branch we are at. That is, we are not in “detached HEAD state”.
  • (tag: v4.5.0)—it’s a tag that is present in our local repository.
  • origin/*—red labels that start with origin show what our local repository knows about the remote repository called ‘origin’. In this case we have:
    • origin/main—a main branch at origin, two commits ahead of our local branch
    • origin/HEAD—the top of the remote repo
    • origin/5.0—a branch that we have on remote but don’t have in local repository,

If I commit the few changes we created above, then there will be a new commit in our local repository, and it will be displayed in the tree like this:

Image description

You can see that:

  • main and origin/main splits into two divergent paths, and the last common commit is bc3a4951,
  • main branch has one commit since bc3a4951, and
  • origin/main has two commits since bc3a4951.

With a situation like this, a remote repository would reject push to our current main branch. One way of addressing this issue is by using git rebase—something I have also covered before.

Comparison

The final thing to cover is comparing different places in the repository tree.

Commit differences with git log

The complete git tree output shows a lot of information. Sometimes your question is narrower, such as what has happened in the repository since a given point: branch, tag, or any other commit reference. For example, to see what has happened since the tag v4.5.0, we can run the following command:

$ git log --oneline main...v4.5.0^

Image description

The command consists of:

  • git log—our commit-showing powerhouse.,
  • --oneline—an argument to keep the output concise.
  • main...v4.5.0^:
    • main—one reference point.
    • ...—three dots, so Git shows commits between two reference points. We could use .., but then the order of references matters: from..to.
    • v4.5.0^—another reference point. A one commit behind v4.5.0 tag—so the tag itself is displayed as well, so it’s easier to read.

Code difference with git diff

Another option we have to compare different points in the tree is to show all the code that changed between references. The basic command to see changes since v.4.5.0 until the current state of main would be:

$ git diff v4.5.0..main
diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md
deleted file mode 100644
index 7070a58e..00000000
--- a/.github/ISSUE_TEMPLATE.md
+++ /dev/null
@@ -1,47 +0,0 @@
-## Are you creating an issue in the correct repository?
-
-- When in doubt, create an issue here.
…

We can add --name-only to see the file names:

Image description

This is very useful to find files or folders we want to investigate further. Then we can just add a path after --:

$ ​​git diff v4.5.0..main -- package.json
diff --git a/package.json b/package.json
index 3bb03006..092b3c66 100644
--- a/package.json
+++ b/package.json
@@ -35,9 +35,9 @@
   ],
   "devDependencies": {
     "eslint": "^7.32.0",
-    "eslint-plugin-compat": "^4.0.0",
+    "eslint-plugin-compat": ">=4.0.0 <4.1.0",
     "glob": "^7.2.0",
-    "grunt": "^1.0.4",
+    "grunt": ">=1.0.4 <1.6.0",
     "grunt-cli": "^1.3.2",
     "grunt-contrib-compress": "^2.0.0",
     "grunt-contrib-concat": "^2.0.0",

CLI recipes

To finish, let’s see examples of a few tasks that

  • are relatively easy with the CLI and
  • are uncommon enough to be impossible, or at least impractical, with graphical Git clients.

Count JavaScript lines in the repo

Have you ever wondered how big your project is? One way to measure it is to count all the lines of code. With Git and a few other command line tools, we can achieve it directly from the command line:

$ git ls-files | grep "\.js$" | xargs cat | wc -l
   52932

Starting from the left, each command in the pipeline does the following:

  • git ls-files—lists all files tracked by the repository
  • grep "\.js$"—leaves only lines that end with “.js”, i.e. the JS files
  • xargs cat—gives all the lines to cat, which concatenates all the files together and sends it to the standard output
  • wc -l—displays counts of the lines of its standard input

Commit authors leaderboard

For the straightforward summary of the number of commits by author, you can use git shortlog -sen:

Image description

If we want to use some filtering, we can recreate this output by combining a few CLI tools. The command is:

$ git log --pretty="%an %ae" --since="last year" | sort | uniq -c | sort -r

and the output looks like:

Image description

The pipeline consists of:

  • git log—the same command as in many examples before
  • --pretty="%an %ae"—custom formatting: we want the author's name & email
  • --since="last year"—our filtering, so you focus the output on the most relevant part instead of the complete history
  • sort | uniq -c—sorts orders the lines alphabetically, and unic -c expects sorted input and counts how many times each line appears
  • sort -r—we sort again, to get the output in a neat order, with the most prolific contributor on the top

Summary

All you need is love Git CLI client. If you would like to learn more about it, sign up here.