28

I have find command that display files in my project:

find . -type f -not -path './node_modules*' -a -not -path '*.git*' \
       -a -not -path './coverage*' -a -not -path './bower_components*' \
       -a -not -name '*~'

How can I filter the files so it don't show the ones that are in .gitignore?

I thought that I use:

while read file; do
    grep $file .gitignore > /dev/null && echo $file;
done

but .gitignore file can have glob patterns (also it will not work with paths if file is in .gitignore), How can I filter files based on patterns that may have globs?

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
jcubic
  • 9,932
  • man find says: "-path pattern ... To ignore a whole directory tree, use -prune" so you want something like find . -name .git -prune -o -name node_modules -prune -o -type f (-o means "or") – milahu Apr 02 '21 at 06:42
  • @MilaNautikus the question was about gitignore not about find, that was just example. – jcubic Apr 02 '21 at 08:38

6 Answers6

17

git provides git-check-ignore to check whether a file is excluded by .gitignore.

So you could use:

find . -type f -not -path './node_modules*' \
       -a -not -path '*.git*'               \
       -a -not -path './coverage*'          \
       -a -not -path './bower_components*'  \
       -a -not -name '*~'                   \
       -exec sh -c '
         for f do
           git check-ignore -q "$f" ||
           printf '%s\n' "$f"
         done
       ' find-sh {} +

Note that you would pay big cost for this because the check was performed for each file.

cuonglm
  • 153,898
16

To show files that are in your checkout and that are tracked by Git, use

$ git ls-files

This command has a number of options for showing, e.g. cached files, untracked files, modified files, ignored files etc. See git ls-files --help.

Kusalananda
  • 333,661
  • I also think about git ls-files but don't think it has option to show untracked files, which one? – cuonglm Apr 11 '17 at 10:41
  • 2
    @cuonglm -o (other). E.g., git ls-files -o -X .gitignore – Kusalananda Apr 11 '17 at 10:42
  • Ah yes, but can't handle the case where the file was tracked before but include in .gitignore. – cuonglm Apr 11 '17 at 10:47
  • Minor note: this does not list the actual files from the system but the files included in the git history. This can be a subtle difference on case-insensitive systems (e.g. Windows shares). Say, I committed ./foo.txt but then rename it to ./Foo.txt. Some git clients will not recognize this change and git ls-files outputs ./foo.txt while find would output ./Foo.txt – Gerrit-K Aug 29 '17 at 08:07
15

there is a git command for doing exactly this: e.g.

my_git_repo % git grep --line-number TODO                                                                                         
desktop/includes/controllers/user_applications.sh:126:  # TODO try running this without sudo
desktop/includes/controllers/web_tools.sh:52:   TODO: detail the actual steps here:
desktop/includes/controllers/web_tools.sh:57:   TODO: check if, at this point, the menurc file exists. i.e. it  was created

As you stated, it will do a basic grep will most of the normal grep options, but it will not search .git or any of the files or folders in your .gitignore file.
For more details, see man git-grep

Submodules:

If you have other git repos inside this git repo, (they should be in submodules) then you can use the flag --recurse-submodules to search in the submodules as well

KNejad
  • 185
6

I think this works well:

git ls-files --cached --modified --other --exclude-standard

If you also want to recurse into submodules, add --recurse-submodules.

Mitar
  • 636
2

You can use an array in which bash glob will be performed.

Having files like this :

touch file1 file2 file3 some more file here

And having an ignore file like this

cat <<EOF >ignore
file*
here
EOF

Using

arr=($(cat ignore));declare -p arr

Will result to this:

declare -a arr='([0]="file" [1]="file1" [2]="file2" [3]="file3" [4]="here")'

You can then use any technique to manipulate those data.

I personally prefer something like this:

awk 'NR==FNR{a[$1];next}(!($1 in a))'  <(printf '%s\n' "${arr[@]}") <(find . -type f -printf %f\\n)
#Output
some
more
ignore
  • gitignore is superset of bash glob, you will miss for something like !file* – cuonglm Apr 11 '17 at 10:29
  • @cuonglm What do you mean by !file* ? As a glob pattern in ignore file or as a filename? – George Vasiliou Apr 11 '17 at 10:37
  • gitignore accept file pattern like !file* to exclude files start with file, and also double star ** – cuonglm Apr 11 '17 at 10:39
  • .gitignore has a bunch of features, all of which might not be that easy to implement directly in Bash: https://git-scm.com/docs/gitignore . Though ignores consisting of just simple globs should be doable in plain Bash, without awk or such – ilkkachu Apr 11 '17 at 10:42
  • @cuonglm You mean translation of ! as not ? Maybe extended globing can handle this situation. About doublestar , bash can handle it with shopt -s globstar. – George Vasiliou Apr 11 '17 at 10:43
  • No, you can't do it even with extended glob. it's !(pattern) in bash, not !pattern. And also for double star, like !**, /** – cuonglm Apr 11 '17 at 10:56
  • @cuonglm You are right about extglob. It has to be like !(pattern) and not !pattern. On the other hand, using !file in git maybe this means that git tools will fail to catch filenames starting with ! ? – George Vasiliou Apr 11 '17 at 11:05
  • @GeorgeVasiliou then escape ! to match ! literally. – cuonglm Apr 11 '17 at 11:08
  • @cuonglm OK. In git tools someone needs to escape ! , in bash tools we need to add ( ) . In both cases you need a kind of "processing". In any case, since we talk about .gitignore there is no doubt that git tools should and will perform better than bash. – George Vasiliou Apr 11 '17 at 11:10
0

This solution uses metaprogramming heavily, but it's much faster than the answer marked correct by reducing repeated shell-command calls.

Replace sed deletion pattern with a substantial subset of your own known working-directory-level (./-starting) exclude list. Put the *-starting ones in the sed substitute pattern like normal find invocation (proper escaping plz).

FILES="$( \
  find . -depth 1 \
  | sed -e '/\/\.git/d' -e '/\/node_modules/d' \
      -e 's/.*/find '"'&'"' -type f/' \
  | sh \
)"
# >&2 echo debug: $FILES

HUGE_SED_COMMAND="$(
echo '/^('$(
git check-ignore --stdin <<<"$FILES"
| sed 's/./\./g; s///\//g'
)')$/d'
| sed 's/ /|/g'
)"

>&2 echo debug: $HUGE_SED_COMMAND

sed -E "${HUGE_SED_COMMAND}" <<<"$FILES"

annahri
  • 2,075