The issue comes from an variable which gets a value that looks like a command line flag, as Satō Katsura pointed out.
However, what you're doing can also be done with,
awk 'NR==FNR {p[++i]=$0;next} {for (i in p){if (match($0,p[i])){c[i]++}}} END {for (i in p){print p[i],c[i]}}' uniq.txt stage.txt >output.txt
... if the number of patterns in uniq.txt
is not in the millions.
The awk
script unraveled:
NR==FNR { p[++i] = $0; next }
{
for (i in p) {
if (match($0, p[i])) {
c[i]++
}
}
}
END {
for (i in p) {
print p[i],c[i]
}
}
It first reads each line of uniq.txt
into the array p
, and then continues with counting (in the array c
) how many lines of input from the second file contains each pattern in p
.
At the end, the patterns and the corresponding counts are outputted.
This avoids a slow shell loop (executing grep
and wc
once for each pattern, and also opening and writing to an output file that many times), and also avoids having to deal with reading patterns into a shell variable with read
.
If you want to do fixed string matching, i.e. not treating the lines in uniq.txt
as regular expression patterns but as fixed strings (as with grep -F
), just change the match($0, p[i])
function call to index($0, p[i])
.
grep -- ${line}...
other suggestions: paste your code in http://www.shellcheck.net/ to see improvements and avoiding pitfalls... another is https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice – Sundeep Sep 13 '17 at 15:54