2

I am trying to find all directories in a folder recursively while exclude all git submodules by excluding all path containing .git file. How could I do it?


Explanation:

.git file exists at the root of every submodules folder. This submodule folder could be included anywhere.


Test Case

$ mkdir Test
$ cd Test
$ mkdir a
$ mkdir b
$ mkdir c
$ cd a
$ mkdir .git
$ cd ..
$ cd b
$ touch .git
$ cd ..
$ cd c
$ mkdir c1
$ mkdir c2
$ cd..
$ find . -type d \( \( ! -name . -exec [ -e {}/.git ] \; -prune \) -o \( \( \
 -name .git\
 -o -name .vscode\
 -o -name node_modules\
 -o -name Image\
 -o -name Rendered\
 -o -name iNotebook\
 -o -name GeneratedTest\
 -o -name GeneratedOutput\
  \) -prune \) -o -print \) | sort 

Expected Results

.
./a
./c
./c/c1
./c/c2
Porcupine
  • 1,892
  • No I meant .git files as that is the case for submodules – Porcupine Sep 13 '18 at 08:44
  • excluding .git folder is not a problem as it can be easily done like this find "$(pwd)" -not \( -path "*/.git"\ ) -type d` – Porcupine Sep 13 '18 at 08:46
  • .git file exist at the root of every submodules folder. This submodule folder could be included anywhere – Porcupine Sep 13 '18 at 08:47
  • But I thought you want to exclude folders containing .git... So your find doesn't work for that even if its a folder. – pLumo Sep 13 '18 at 08:48

4 Answers4

6

find actions are also tests, so you can add tests using -exec:

find . \( -exec [ -f {}/.git ] \; -prune \) -o \( -name .git -prune \) -o -print

This applies three sets of actions:

  • -exec [ -f {}/.git ] \; -prune prunes directories containing a file named .git
  • -name .git -prune prunes directories named .git (so the command doesn’t search inside the main .git directory of a repository)
  • -print prints anything which isn’t caught by the above.

To only match directories, add -type d, either just before -print, or (to save time processing files):

find . -type d \( \( -exec [ -f {}/.git ] \; -prune \) -o \( -name .git -prune \) -o -print \)

This also works when run this on a directory other than ., by changing the find start path:

find /some/other/path -type d \( \( -exec [ -f {}/.git ] \; -prune \) -o \( -name .git -prune \) -o -print \)
Stephen Kitt
  • 434,908
  • 1
    This is really cool. And I thought my solution is clever ;-) – pLumo Sep 13 '18 at 09:13
  • @Stephen Kitt You mentioned -name .git -prune to exclude directories. Is it better to use -not -path "*/.git/*" – Porcupine Sep 13 '18 at 09:39
  • 1
    @Nikhil -not -path "*/.git/*" checks all the files and directories inside .git, which can take a little while, whereas -name .git -prune avoids descending into the directory at all; the latter is more efficient. – Stephen Kitt Sep 13 '18 at 09:49
  • @StephenKitt I found one semantic issue. ( ! -name . -exec [ -e {}/.git ] ; -prune ) is excluding parent folders with .git files as well as .git folder. .git file is present in Submodules. But, .git folder is present in normal repository and we don't want to exclude normal repository. – Porcupine Sep 15 '18 at 19:31
  • @StephenKitt With the Test (See EDIT) your solution does not give desired results. It skips folder a – Porcupine Sep 15 '18 at 19:38
  • @Nihil ah yes, I missed the strict “file” requirement. See my updated answer. – Stephen Kitt Sep 15 '18 at 20:58
  • @StephenKitt In command line - What is the difference between find . and find . -print - Unix & Linux Stack Exchange you mentioned that find . and find . -print are equivalent. Is it the same for this case as well? Can we skip -print here? – Porcupine Sep 16 '18 at 09:25
  • 1
    @Nikhil no: -print is the default action, used when no other expression is given. Here the find command includes other expressions (-exec, -prune, and -name) so we need to specify -print explicitly. Try removing -print: the find command won’t output anything. find . and find . -print are equivalent as complete commands, not as portions of commands. – Stephen Kitt Sep 16 '18 at 13:14
  • @StephenKitt Using Your example, I tried to find index.md while excluding path with .git file and Rendered directory, like this find . -type f \( \( -exec [ -f {}/.git ] \; -prune \) -o \( \( -name "Rendered" \) -prune \) -o -name 'index.md' \) -print | sort. But it doesn't prunes out Rendered? How should I modify this? – Porcupine Sep 22 '18 at 22:49
  • 1
    @Nikhil your very first test in that command limits everything else to files, so you’ll never prune any directory. You need to split the tests up to prune directories and then look for index.md: find . \( -type d \( -exec [ -f {}/.git ] \; -o -name "Rendered" \) -prune \) -o \( -type f -name index.md -print \). – Stephen Kitt Sep 22 '18 at 23:01
  • @StephenKitt I get same error with: PruneDir="\( \( -exec [ -f {}/.git ] \; -prune \) -o \( \( -name .git -o -name 'Data' \) -prune \) -o -print \)"; find . -maxdepth 5 -type d ${PruneDir:-}; Error: find: paths must precede expression: \(' – Porcupine Apr 05 '19 at 13:14
  • @Nikhil right, you also need to fix the escaping of your brackets and semi-colons: PruneDir="( ( -exec [ -f {}/.git ] ; -prune ) -o ( ( -name .git -o -name 'Data' ) -prune ) -o -print )"; find . -maxdepth 5 -type d ${PruneDir:-}; – Stephen Kitt Apr 05 '19 at 13:17
  • @StephenKitt I executed as you said (please see the Gist: Find With Prune. ) But, the prune is not pruning out the folders. – Porcupine Apr 05 '19 at 13:30
3

We can create a recursive find:

Add the following lines to a script file:

#!/bin/bash
if [ ! -f "$1"/.git ]; then
    echo "$1"
    find "$1" -mindepth 1 -type d -prune -exec "$0" {} \;
fi

I named the file findifnotgit but it doesn't matter. Then make it executable

chmod u+x findifnotgit

Then run it with the path you want to run as argument:

./findifnotgit .

--> . for current dir

or

./findifnotgit /path/to/search/

Explanation:

  • if [ ! -f "$1"/.git ]; then ... fi Only run the following when there is not .git file inside the current folder ($1)
  • We need -mindepth 1 option to let find not find the folder we started with which would create an indefinite loop.
  • We need -prune so that find will not descend into directories. We will do this ourselves inside -exec.
  • -exec "$0" {} will call the same script $0 with the finds.
pLumo
  • 22,565
0

Bit of a dirty script, but this will find all directories that don't contain a .git file in them:

#!/bin/bash

# find dirs that contain *.git files and store in array
exclude_dirs=($(find . -type f -iname ".git" | xargs -i dirname {}))

# set up the base command
command="find . -type d"

# loop over each element in array by index
for i in $(seq 0 $(expr ${#exclude_dirs[@]} - 1)); do
    command="${command} -not -path ${exclude_dirs[$i]}"
done

# run the find
eval ${command}

edit:

fixed syntax errors and updated *.git to .git

edit2:

yep that was wrong, my apologies. Edited so it actually works now.

RobotJohnny
  • 1,039
  • 8
  • 18
0

Here is how to tell find not to look inside the .git or .hg repositories.

find .  \( -iname '.git' -o -iname '.hg' \) -prune -false  -o -iname  '*thing-i-am-looking for*'
  • 1
    do you know something I don't know ? what is bookmarker, why .hg? Also I thought OP wants to find folders and not anything called *thing-i-am-looking for* ? – pLumo Sep 13 '18 at 08:54
  • Sorry left in file name from test, and I included two things to ignore, so one can see how to extend it. – ctrl-alt-delor Sep 13 '18 at 08:57