3

I know that squeezing multiple blank lines can be done using cat -s (and squeezing all blank lines can be done using tr -s '\n'), but I'm curious how to search for this condition in a stream of input.

I thought that stream-of-input | grep -qz $'\n\n\n' would do it, but it doesn't.

Is there a way to do this search with simple tools?

In other words, read input, and exit with a zero status if three consecutive bytes are newline characters, or exit with nonzero status if EOF is reached without finding three consecutive newline characters.

Wildcard
  • 36,499

2 Answers2

3

You can use tr to transform the stream into one you can grep normally:

stream | tr 'x\n' '\0x' | grep -qz xxx

This turns all x bytes into null bytes, and all linefeed bytes into xs, which can be grepped out as usual. That is, it moves one step along the path linefeed -> x -> null, so a sequence of three linefeeds will now be a sequence of three xs, and no other x bytes will occur (they will have become nulls terminating lines for the grep).


This works with POSIX tr, but grep -z is an extension. You may not need it - the separation behaviour isn't required here - and most greps will handle binary data, but POSIX grep is only required to work on text files so you're going to be depending on an extension one way or another.

If your real data is a text file, or just doesn't depend on binary-safe behaviour, you can probably survive on just

stream | tr 'x\n' '\nx' | grep -q xxx

- that is, just swapping the two bytes. This is nearly POSIX-compatible, but will likely work in practice just about anywhere (the issue is that the final line won't be terminated correctly, so it's not a text file, so grep isn't strictly required to accept it).

One possible issue in either case is that a file with no existing x bytes will be considered as one very long line, which may exceed the limits your grep implementation will handle. Choosing another expected-to-be-common byte may work around that.

I was surprised that your original grep -qz $'\n\n\n' command didn't work, but it had a false-positive problem for me - it seemed to behave like grep -qz '' and always matched. I'm not sure why that is.

Michael Homer
  • 76,565
  • Actually, grep -z is unneeded. stream | tr 'x\n' '\nx' | grep -q xxx works perfectly and is POSIX. Thanks! – Wildcard Jan 08 '19 at 04:53
  • 1
    There is one really technical point that makes it still certainly non-POSIX - the final line won't have a terminating line feed, so it's strictly speaking a text file and grep isn't required to accept it - but for all practical purposes it should be fine. | (tr 'x\n' '\nx' ; printf '\n') | is fully compliant up to x being too infrequent. – Michael Homer Jan 08 '19 at 05:07
2

lex (or flex) could handle this, e.g. the following saved to the file tresn.l with the extra rules mostly to prevent the default output to stdout (you may want that?)

%%
\n\n\n  { exit(0); }
<<EOF>> { exit(1); }
\n\n    { ; }
\n      { ; }
.       { ; }
%%

compiles up with implicit make rules plus pulling in of libfl*

$ CFLAGS=-lfl make tresn
lex  -o lex.tresn.c tresn.l
cc -lfl   -o tresn lex.tresn.c  -ll
rm -f lex.tresn.c
$ printf "\n\n" | ./tresn ; echo $?
1
$ printf "\n\n\n" | ./tresn ; echo $?
0

on some systems you may need to add -L/opt/local/lib or such to CFLAGS or also LDFLAGS if libfl* is hiding off under some ports or package system outside of the vendor compile space.

thrig
  • 34,938