A normal shallow clone can cause you grief if you need git info, but a slightly more complex approach compromises very little.
Here’s the format:
git init ./<repository-name>
git remote add <remote-name> <repository-url>
git -c protocol.version=2 fetch \
--no-tags \
--prune \
--progress \
--no-recurse-submodules \
--depth=1 <remote-name>
git checkout \
--progress \
--force -B <branch-name> \
refs/remotes/<remote-name>/<branch-name>
And a working example:
git init /tmp/keypairs
git remote add origin https://github.com/therootcompany/keypairs
git -c protocol.version=2 fetch \
--no-tags \
--prune \
--progress \
--no-recurse-submodules \
--depth=1 origin
git checkout \
--progress \
--force -B main \
refs/remotes/origin/main
Just make sure to remove unused branches - especially old ones - from your repo from time to time, to keep the multi-branch shallow fetch as light as possible.
Hint: You can delete stale branches here: https://github.com/{owner}/{repo}/branches/stale
Naive Shallow Clones
Shallow clones are ideal for CI/CD and other ephemeral environments where there’s not a lot of value in taking the time or bandwidth to download a repositories entire history.
The naive (and official) approach for a shallow clone is this:
git clone --branch <branch-name> --depth 1 <repository_url>
Example:
git clone --branch main --depth 1 git@github.com:therootcompany/keypairs
A shallow clone is fast and bandwidth efficient, but the problem is that it leaves your repo in somewhat of a Frankenstein state. Things like git pull
and git checkout
will no longer work as you’d most likely expect.
The GitHub Actions approach
GitHub Actions (actions/checkout@v2) takes a much more complex, but complete approach -
Rather than a shallow clone of a single branch, it does a shallow fetch of all branches.
I’ll document it in full here (you can also see this by inspecting the Actions
tab of any project), but you don’t need all of it. I’ve marked the important steps in bold. It looks like this:
-
Initializing the repository
# Create and initialize an empty directory git init /home/app/<repo-name> # Manually set the remote git remote add <remote> <repository_url>
-
Disabling automatic garbage collection
# (not sure why - probably just an optimization to reduce ephemeral thrash) git config --local gc.auto 0
-
Setting up auth
# Set up authorization for the repo and all submodules git config --local --name-only --get-regexp core\.sshCommand git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || : git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader git submodule foreach --recursive git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || : git config --local http.https://github.com/.extraheader AUTHORIZATION: basic <encoded-personal-access-token>
-
Fetching the repository
git -c protocol.version=2 fetch --no-tags --prune --progress --no-recurse-submodules --depth=1 <remote>
-
Determining the checkout info
# this is an internal step to determine the name of the branch # that was just pushed to and is being worked on, such as 'main'.
-
Checking out the ref
git checkout --progress --force -B <branch> refs/remotes/<remote>/<branch>
-
Setting up auth for fetching submodules
# IMPORTANT NOTE: # GitHub Actions overwrites HOME prior to running this step, # which ensures that workflows that run before this one # (which may also use custom auth), don't mess this one up. # At home, however, you probably would do this differently # (see below). git config --global http.https://github.com/.extraheader AUTHORIZATION: basic <encoded-personal-access-token> git config --global --unset-all url.https://github.com/.insteadOf git config --global url.https://github.com/.insteadOf git@github.com:
-
Fetching submodules
git submodule sync --recursive git -c protocol.version=2 submodule update --init --force --depth=1 --recursive
-
Persisting credentials for submodules
git submodule foreach --recursive git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || : git submodule foreach --recursive git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic <encoded-personal-access-token>' && git config --local --show-origin --name-only --get-regexp remote.origin.url git submodule foreach --recursive git config --local 'url.https://github.com/.insteadOf' 'git@github.com:'
-
(Display the
ref
sha1 hash)git log -1 --format='%H'
If you’d like to understand a little bit more about all of that auth stuff, and what your options are - because your auth needs for your deploys may vary significantly from what GitHub does insternally - check out The Vanilla DevOps Git Credentials & Private Packages Cheatsheet.