3

This gives result 124:

awk 'BEGIN {FS = ","; count = 0}; { if ($7 ~ /Nature Life/) { count++ }} END   {print count}' file.csv

This gives result 123:

grep -cE '^([^,]*,){6}[^,]*Nature Life' file.csv

The file is too large to read.

Any suggestion ?

steve
  • 21,892
  • 2
    A shot in the dark: check if last line matches and if there is a new line at the end of the last line. – andcoz Oct 18 '15 at 19:21
  • If there are only 123 matching lines you might be able to print them and diagnose by hand – shadowtalker Oct 18 '15 at 20:52
  • Testing gives the same count. But that is assuming the real data. Could you provide some lines of the file to test. –  Oct 19 '15 at 14:32

2 Answers2

4

If you want to find the inconstancy, the following should reveal the one line that the awk is catching that the grep is not:

awk 'BEGIN{FS=","}$7~/Nature Life/' file.csv | grep -vE '^([^,]*,){6}[^,]*Nature Life'

The objective here is to print everything the awk sees and then filter out everything that the grep sees (grep -v). The likelihood is that your grep's regex is not 100% what you're looking for.

  • The grep -v to find the odd line is clever, but your statement about not using ; after a BEGIN block is false/irrelevant; it makes no difference to the running of the awk script. – Wildcard Dec 20 '15 at 05:41
  • You're right I shouldn't have said that. Edited. Thanks! – user.friendly Dec 21 '15 at 14:15
1

For GNU grep at least, in a UTF-8 locale ,[^,]*, will not match ,something, if something contains sequences of bytes that don't form valid characters.

For instance:

$ printf '1,\200,3,4,5,6,Nature Life,8\n' |
   grep -cE '^([^,]*,){6}[^,]*Nature Life'
0

While, for awk field splitting, it does not matter:

$ printf '1,\200,3,4,5,6,Nature Life,8\n' | awk -F, '$7 ~ /Nature Life/'
1,�,3,4,5,6,Nature Life,8

Run grep under LC_ALL=C to avoid issues with text in the wrong encoding (as long as the string to search and the separator (,) are in ASCII).

$ printf '1,\200,3,4,5,6,Nature Life,8\n' |
   LC_ALL=C grep -cE '^([^,]*,){6}[^,]*Nature Life'
1