13

I have a file whose content is similar to the following one.

0
0
0.2
0
0
0
0

I need to remove all the lines with a single zero.
I was thinking to use grep -v "0", but this removes also the line containing 0.2. I saw I could use the -w option, but this doesn't seem to work either.

How can I remove all the lines containing just a single 0 and keep all those lines starting with a 0?

apaderno
  • 825

8 Answers8

38
grep -vx 0

From man grep:

-x, --line-regexp
       Select only those matches that exactly match the whole line.
       For a regular expression pattern, this is like parenthesizing
       the pattern and then surrounding it with ^ and $.

-w fails because the first 0 in 0.02 is considered a "word", and hence this line is matched. This is because it is followed by a "non-word" character. You can see this if you run the original command without -v, i.e. grep -w "0".

Sparhawk
  • 19,941
  • You could also use the -F option since we're not using regex patterns, just plain string matching – glenn jackman Feb 14 '19 at 13:54
  • @glennjackman Maybe I've read this earlier, but I can't seem to find it now. Running with -F (surprisingly to me) appears to take a similar amount of time or even slightly slower (~5–10%). Hence, I'm not sure what the advantage would be. – Sparhawk Feb 14 '19 at 21:53
  • 2
    It's possible that the RegEx engine is used so often and so widely used that they have implemented a very efficient version of it, but that a "plain search" probably has not been upgraded for 30 years. – Nelson Feb 15 '19 at 03:31
  • @Sparhawk: grep presumably has a special case for regexes with no metacharacters, because that's a common use-case. It's surprising that fgrep would be slower, but it's not surprising that the overhead of noticing this special case while compiling a short pattern is negligible vs. the time to scan a large file. (If it requires a special case at all to go that fast, vs. a pattern with a character class or x.*y.) – Peter Cordes Feb 15 '19 at 14:54
  • But that's maybe an oversimplification because the input is actually many short lines (not one giant string). I forget if grep recognizes any character other than \n newline as a line separator. If not, the implicit ^ and $ can still turn into a fixed-string search like strstr(big_buf, "\n0\n"). (Or 0\n at the start of a buffer.) But we're not just looking for the first match potentially far into a big buffer, we want to efficiently filter. But anyway, in theory yes it's just a 2-byte memcmp at the start of each line, and you'd hope that both fgrep and grep would see that. – Peter Cordes Feb 15 '19 at 15:04
30

With grep:

grep -v '^0$' file

^ means beginning of the line, $ means end of the line.

14

While grep can be used for this (as other answers clearly show), let’s take a step back and think about what you actually want:

  • You have a file containing numbers
  • You want to perform filtering based on the numeric value.

Regex interpret character sequence data. They don’t know about numbers, only about individual digits (and regular combinations thereof). Although in your particular case there’s a simple hack around this limitation, it’s ultimately a requirement mismatch.

Unless there’s a very good reason to use grep here (e.g. because you’ve measured it, and it’s vastly more efficient, and efficiency is crucial in your case), I recommend using a different tool.

awk, for instance, can filter based on numeric comparisions, e.g.:

awk '$1 == 0' your_file

But also, to get all lines containing numbers greater than zero:

awk '$1 > 0' your_file

I love regex, it’s a great tool. But it’s not the only tool. As the saying goes, if all you have is grep, everything looks like a regular language.

  • 3
    I wholeheartedly agree that awk may be more elegant here... however, it will also match maybe a little bit more than what the user expects (every numerical value evaluating to 0). Ie, printf '0\n1\n-1\na\nb\n0\n0 also\n0.0\n-0.0\n0*0\n' | awk '($1 == 0)' will match: 0, 0.0 and -0.0... and also 0 also ! Not just "0". (which is sometimes what's needed, sometimes not). If the user want only "0" : awk '/^0$/' (or grep '^0$'). Also you should edit: the user needs to add ! to negate the test, so it hides 0 (and other zeroes) and displays the rest. ie: awk '!( $0 == 0)' – Olivier Dulac Feb 14 '19 at 15:20
  • 1
    @Olivier, or check the string value: $1 == "0" – glenn jackman Feb 14 '19 at 18:04
  • 1
    @OlivierDulac I explicitly used > rather than != (or, equivalently, ! (… == …)) to highlight that this is an arbitrary numerical comparison, not just equality. As for your other comment, this is entirely true but then we’re essentially back in string comparison territory and the existing solution using grep works (though awk of course also works). – Konrad Rudolph Feb 15 '19 at 11:39
  • @KonradRudolph fair points :) – Olivier Dulac Feb 15 '19 at 18:57
  • 1
    @glennjackman: nice trick indeed. But then OP would rather do test $0=="0" – Olivier Dulac Feb 15 '19 at 18:59
6

grep's -w is a bit convoluted in a way that it splits up the original string into word and non-word constituents (anything except letters, digits or underscore) . Since it has already encountered a a valid word constituent 0 in 0.02 it had asserted the negation logic to remove the line.

Using sed is a bit easy in this context to just remove the whole words that match

sed '/^0$/d' file
Inian
  • 12,807
4

When the lines you want to delete only contain a 0 followed by the next line you can select those lines by issuing the following command:

grep -v "^0$"

This will only print the occurrences of 0 that are at the end of a line and at the beginning of a line at the same time. The -v option then inverts our selection.

Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
  • 1
    This answer is almost identical to Arkadiusz Drabczyk's, but you forgot the -v, so it doesn't work. – Sparhawk Feb 14 '19 at 08:01
  • You're right. I was typing while he posted his answer so I didn't see it has already been given. I've misread that part with the -v option, thanks! – majesticLSD Feb 14 '19 at 08:10
0
  • \b - word border

    grep -v "\b0\b"
    
  • match beginning of line, your pattern and end of line

    grep -v "^0$"
    
  • or as @Sparhawk suggested -vx lineregexp

Note that -w works, but in your case 0.2 are two words because dot character is a word separator.

AdminBee
  • 22,803
Jakub Jindra
  • 1,462
0

Another answer for the sake of variety, assuming you have a PCRE-enabled grep

grep -Pv "^0(?!\.)"

this performs a negative lookahead to match the lines that start with 0 and are not followed by a dot. Then -v discards non-matching lines. You can see in action here

mrbolichi
  • 109
0

Assuming any line which is not just a single 0 has a period

grep '\.' file