With GNU grep, pass --binary-files=without-match
to ignore binary files. Source code files are text files, so they will be included in the results.
If you want to ignore text files with certain extensions, you can use the --exclude
option, e.g.
grep -r --exclude='*.html' --exclude='*.js' …
or you can instead include only explicitly-matching files, e.g.
grep -r --include='*.txt' …
If you want to ignore text files that are source code, you can use the file
command to guess which files are source code. This uses heuristics so it may detect source code as non-source-code or vice versa.
find -type f exec sh -c '
for x do
case $(file <"$x") in
*source*) :;; # looks like source code
*text*) grep -H -e "$0" "$x";; # looks like text
# else: looks like binary
esac
done
' "REGEXP" {} +
or
find -type f exec sh -c '
for x do
case $(file -i <"$x") in
text/plain\;*) grep -H -e "$0" "$x";; # looks like text
# else: looks like source code or binary
esac
done
' "REGEXP" {} +
Alternatively, you may use ack instead of grep. Ack integrates a file classification system based on file names. It's geared towards searching in source code by default, but you can tell it to search different types by passing the --type
option. Search ALL files with ack may help.
Binary file matches
. which I don't even want. For matches in js and html files, the outputs are too long because their contents are long, making matches in plain text files difficult to see. – Tim Mar 15 '15 at 01:04