It can at least be simplified to:
set -f # needed if you're using the split+glob operator and don't want the
# glob part
for key in $(cat /tmp/listOfKeys.txt); do
grep -riFqe "$key" . ||
printf '%s\n' "$key has no occurrence"
done
Which would stop searching after the first occurrence of the key
and not consider the key as a regular expression (or possible option to grep
).
To avoid having to read files several times, and assuming your list of keys is one key per line (as opposed to space/tab/newline separated in the code above), you could do with GNU tools:
find . -type f -size +0 -printf '%p\0' | awk '
ARGIND == 2 {ARGV[ARGC++] = $0; next}
ARGIND == 4 {a[tolower($0)]; n++; next}
{
l = tolower($0)
for (i in a) if (index(l, i)) {
delete a[i]
if (!--n) exit
}
}
END {
for (i in a) print i, "has no occurrence"
}' RS='\0' - RS='\n' /tmp/listOfKeys.txt
It's optimised in that it will stop looking for a key
as soon as it's seen it and will stop as soon as all the keys have been found and will read the files only once.
It assumes keys are unique in listOfKeys.txt
. It will output the keys in lower case.
The GNUisms above are -printf '%p\0'
, ARGIND
and the ability of awk
to handle NUL delimited records. The first two can be addressed with:
find . -type f -size +0 -exec printf '%s\0' {} + | awk '
step == 1 {ARGV[ARGC++] = $0; next}
step == 2 {a[tolower($0)]; n++; next}
{
l = tolower($0)
for (i in a) if (index(l, i)) {
delete a[i]
if (!--n) exit
}
}
END {
for (i in a) print i, "has no occurrence"
}' step=1 RS='\0' - step=2 RS='\n' /tmp/listOfKeys.txt step=3
The third one could be addressed with tricks like this one, but that's probably not worth the effort. See Barefoot IO's solution for a way to bypass the problem altogether.
ctags
orcscope
to index your code if those strings are code symbols. – Stéphane Chazelas Mar 04 '16 at 17:42