19

Naive approach is find dir1 dir2 dir3 -type d -name .git | xargs -I {} dirname {} , but it's too slow for me, because I have a lot deep folder structures inside git repositories (at least I think that this is the reason). I've read about that I can use prune to prevent find to recurse into directories once it found something, but there's two things. I'm not sure how this works (I mean I don't understand what prune does although I've read man page) and the second it wouldn't work in my case, because it would prevent find to recurse into .git folder but not into all other folders.

So what I actually need is:

for all subdirectories check if they contain a .git folder and if it is then stop searching in this filesystem branch and report result. It would be perfect if this would also exclude any hidden directories from search.

7 Answers7

16

Okay, I still don't totally sure how this works, but I've tested it and it works.

.
├── a
│   ├── .git
│   └── a
│       └── .git
└── b
    └── .git

6 directories, 0 files

% find . -type d -exec test -e '{}/.git' ';' -print -prune
./a
./b

I'm looking forward into making the same faster.

  • 3
    This of -prune this way: You start at the root of a tree you move down it and when a certain condition applies you cut of a whole subtree (like real "pruning"), so you won't look at any more nodes in this subtree. – phk Dec 31 '16 at 00:51
  • @phk oh, thanks. I seem to grasp it now. We searching directories -type d for which condition test -e ... is true and if it's true we execute actions -print -prune which means print it and cut subtree, right? – user1685095 Dec 31 '16 at 01:05
  • Yes, we cut the subtree of which it is the root. – phk Dec 31 '16 at 01:08
  • Quick one to use your solution to "update" all git repos: find . -type d -exec test -e '{}/.git' \; -print -prune | parallel cd "{}" \&\& git pull --rebase GNU parallel is a very handy replacement for xargs – Marcello Romani Sep 10 '18 at 08:31
  • you will not get sub-modules, which are also git repos. You might want to fetch them by recursively fetching sub-modules, once you have the root-repos list returned by this command. – hoijui Apr 15 '20 at 05:34
4

Ideally, you'd want to crawl directory trees for directories that contain a .git entry and stop searching further down those (assuming you don't have further git repos inside git repos).

The problem is that with standard find, doing this kind of check (that a directory contains a .git entry) involves spawning a process that executes a test utility using the -exec predicate, which is going to be less efficient than listing the content of a few directories.

An exception would be if you use the find builtin of the bosh shell (a POSIXified fork of the Bourne shell developed by @schily) which has a -call predicate to evaluate code in the shell without having to spawn a new sh interpreter:

#! /path/to/bosh -
find . -name '.?*' -prune -o \
  -type d -call '[ -e "$1/.git" ]' {} \; -prune -print

Or use perl's File::Find:

perl -MFile::Find -le '
  sub wanted {
    if (/^\../) {$File::Find::prune = 1; return}
    if (-d && -e "$_/.git") {
       print $File::Find::name; $File::Find::prune = 1
    }
  }; find \&wanted, @ARGV' .

Longer, but faster than zsh's printf '%s\n' **/.git(:h) (which descends into all non-hidden directories), or GNU find's find . -name '.?*' -prune -o -type d -exec test -e '{}/.git' \; -prune -print which runs one test command in a new process for each non-hidden directory.

2022 edit. The find applet from recent versions of busybox is able to run its [ or test applet without having to fork a process and reexecute itself inside, so, even though it's still not as fast as the bosh or perl approaches:

busybox find . -type d -exec [ -e '{}/.git' ] ';' -prune -print

In my test is several orders of magnitude faster than the GNU find equivalent (on a local sample containing a mix of git / cvs / svn repositories with over 100000 directories in total, I get 0.25s for bosh, 0.3s for perl 0.7s for busybox find, 36s for GNU find, 2s for GNU find . -name .git -printf '%h\n' (giving a different result as it also finds .git files in subdirs of git repositories)).

3

If you use locate, you could find directories with:

locate .git | grep "/.git$"

Result list is fast and further processing is easy, too.

Jarivaa
  • 69
2

Possible Solution

For GNU find and other implementations that support -execdir:

find dir1 dir2 dir3 -type d -execdir test -d '.git' \; -print -prune

(see the comments)

Previously discussed stuff

Solution if pruning below .git is enough

find dir1 dir2 dir3 -type d -path '*/.git' -print -prune | xargs -I {} dirname {}

If -printf '%h' is supported (as in the case of GNU's find) we don't need dirname:

find dir1 dir2 dir3 -type d -path '*/.git' -printf '%h\n' -prune

Once it comes across a folder .git in the current path it will output it and then stop looking further down the subtree.

Solution if the whole folder tree should be pruned once a .git is found

Using -quit if your find supports it:

for d in dir1 dir2 dir3; do
  find "$d" -type d -name .git -print -quit
done | xargs -I {} dirname {}

(According to this detailed post by Stéphane Chazelas -quit is supported in GNU's and FreeBSD's find and in NetBSD as -exit.)

Again with -printf '%h' if supported:

for d in dir1 dir2 dir3; do
  find "$d" -type d -name .git -printf '%h\n' -quit
done

Solution for pruning at the same level as where the .git folder is

See the "Possible Solution" part for the current solution for this particular problem.

(Oh and obviously the solutions using xargs assume there are no newlines in the paths, otherwise you would need null-byte magic.)

phk
  • 5,953
  • 7
  • 42
  • 71
  • if dir1 contains two directories dirx and diry that each contain a .git directory, this only reports dirx/.git – iruvar Dec 30 '16 at 20:15
  • @iruvar Ah OK, I misunderstood you in that case, I will try to redo the solution then. – phk Dec 30 '16 at 20:18
  • the issue with your new solution is this if dir1/.git exists, it still descends dir1/dirx, which, based on my reading of OP's requirement, is not desired – iruvar Dec 30 '16 at 20:55
  • @iruvar OK, added that as well. Any other ideas about what OP could have meant? ;-) – phk Dec 30 '16 at 21:39
  • @iruvar exactly – user1685095 Dec 30 '16 at 22:25
  • @user1685095 OK, then the last solution is for you. I will still leave the others there, might be useful for someone else. – phk Dec 30 '16 at 22:27
  • @phk well... first of all using find -mindepth 1 -maxdepth 1... to check if there is a .git folder is very inefficient. Second this is just a for loop without any parallelization, and I've said I need this to be fast. And the last one - this can be done with just one line. find d1 d2 d3 -type d -exec test -e '{}/.git' ';' -print -prune, but I'm not sure it's fast enough for me. Is there a way to make -exec in parallel, because it seems to be performance bottleneck. – user1685095 Dec 30 '16 at 23:26
  • @user1685095 You can use xargs -P if your implementation has it or alternatively one of the tools called parallel (one is AFAIK Python code though). But in a sense I'm not sure it would be faster this way, because I don't think CPU is much of a bottleneck here. – phk Dec 30 '16 at 23:32
  • @phk I know about xargs and parallel, the problem is that if I understand right exec not only executes the command on search results, but also affects -prune. I mean the reason find doesn't descend deeper is because test -e returned true. And xargs can't do that. – user1685095 Dec 30 '16 at 23:40
  • @phk CPU isn't the bottleneck, but disk IO and kernel calls is. And if it can be done in parallel that would speed things right? – user1685095 Dec 30 '16 at 23:41
  • @user1685095 Not sure, would have to benchmark it. BTW, what about the latest solution? – phk Dec 30 '16 at 23:53
  • @phk well, I've already told you, it's inefficient and not nice. – user1685095 Dec 30 '16 at 23:54
1

You could also use ls -d */.git and then strip the .git at the end.

This wont look for subfolders.

0

Use

find ~/GIT-REPOSITORIES \( -exec test -d '{}'/.git \; \) -print -prune

time this, to see the difference with and without -prune.

This is based on a solution in the man find. You can edit out the CVS and svn if not required. man page content follows

find repo/ \( -exec test -d '{}'/.svn \; -or \
       -exec test -d {}/.git \; -or -exec test -d {}/CVS \; \) \
       -print -prune

Given the following directory of projects and their associated SCM administrative directories, perform an efficient search for the projects' roots:

repo/project1/CVS
repo/gnu/project2/.svn
repo/gnu/project3/.svn
repo/gnu/project3/src/.svn
repo/project4/.git

In this example, -prune prevents unnecessary descent into directories that have already been discovered (for example, we do not search project3/src, because we already found project3/.svn), but ensures sibling directories (project2 and project3) are found.

Paulo Tomé
  • 3,782
0

Here my solution to show all repos and their current branches:

for d in `find . -name "HEAD" -prune | grep "git/HEAD" | sort`; do echo $d | sed 's/\.git\/HEAD//'; cat $d; echo; done

Result:

path1
ref: current_brach_of_repo

path2 ref: current_brach_of_repo

path3 ref: current_brach_of_repo