21

Given this minimal example

( echo "LINE 1" ; sleep 1 ; echo "LINE 2" ; )

it outputs LINE 1 and then, after one second, outputs LINE 2, as expected.


If we pipe this to grep LINE

( echo "LINE 1" ; sleep 1 ; echo "LINE 2" ; ) | grep LINE

the behavior is the same as in the previous case, as expected.


If, alternatively, we pipe this to cat

( echo "LINE 1" ; sleep 1 ; echo "LINE 2" ; ) | cat

the behavior is again the same, as expected.


However, if we pipe to grep LINE, and then to cat,

( echo "LINE 1" ; sleep 1 ; echo "LINE 2" ; ) | grep LINE | cat

there is no output until one second passes, and both lines appear on the output immediately, which I did not expect.


Why is this happening and how can I make the last version to behave in the same way as the first three commands?

lisyarus
  • 313
  • cat concatenates files. What are you trying to do by piping into cat? – Douglas Held Sep 05 '18 at 20:09
  • 15
    @DouglasHeld When called without arguments, cat simply reads stdin and outputs into stdout. Of course, I came up with this question with a lot of complex stuff in place of echo and cat, but these turned out to be irrelevant, since the problem shows up with much simpler examples. – lisyarus Sep 05 '18 at 21:11
  • 4
    @DouglasHeld: Piping to cat is often useful to force stdout to not be a terminal. For instance, this is an easy way to get many commands to not use colorized output. – wchargin Sep 07 '18 at 05:01
  • I swear this is a dupliciate of another question on Stack Overflow! – iBug Sep 07 '18 at 06:46
  • @wchargin thank you very much, you have taught me something new about posix that I never knew. – Douglas Held Oct 12 '18 at 22:01

3 Answers3

41

When (at least GNU) grep’s output is not a terminal, it buffers its output, which is what causes the behaviour you’re seeing. You can disable this either using GNU grep’s --line-buffered option:

( echo "LINE 1" ; sleep 1 ; echo "LINE 2" ; ) | grep --line-buffered LINE | cat

or the stdbuf utility:

( echo "LINE 1" ; sleep 1 ; echo "LINE 2" ; ) | stdbuf -oL grep LINE | cat

Turn off buffering in pipe has more on this topic.

Stephen Kitt
  • 434,908
26

Simplified explanation

Like many utilities, this not being something peculiar to one program, grep varies its standard output between being line buffered and fully buffered. In the former case, the C library buffers output data in memory until either the buffer holding those data is filled or a linefeed character is added to it (or the program ends cleanly), whereupon it calls write() to actually write the buffer contents. In the latter case, only the in-memory buffer becoming full (or the program ending cleanly) triggers the write().

More detailed explanation

This is the well-known, but slightly wrong, explanation. In fact, standard output is not line buffered but smart buffered in the GNU C library and BSD C library. Standard output is also flushed when reading standard input exhausts its in-memory buffer (of pre-read input) and the C library has to call read() to fetch some more input and it is reading the beginning of a new line. (One reason for this is to prevent deadlock when another program connects itself to both ends of a filter and expects to be able to operate line-by-line, alternating between writing to the filter and reading from it; like "coprocesses" in GNU awk for example.)

C library influence

grep and the other utilities do this — or, more strictly, the C libraries that they use do this, because this is a defined feature of programming in the C language — based upon what they detect their standard output to be. If (and only if) it is not an interactive device, they choose full buffering, otherwise they choose smart buffering. A pipe is considered to be not an interactive device, because the definition of being an interactive device, at least in the world of Unix and Linux, is essentially the isatty() call returning true for the relevant file descriptor.

Workarounds to disable full buffering

Some utilities like grep have idiosyncratic options such as --line-buffered that change this decision, which as you can see is mis-named. But a vanishingly small fraction of the filter programs that one could use actually have such an option.

More generally, one can use tools that dig into the specific internals of the C library and change its decision making (which have security problems if the program to be altered is set-UID, and are also specific to particular C libraries, and indeed are specific to programs written in or layered on top of the C language), or tools such as ptybandage that do not change the internals of the program but simply interpose a pseudo-terminal as standard output so that the decision comes out as "interactive", to affect this.

Further reading

JdeBP
  • 68,745
  • 1
    If the phrase "line buffered" is a misnomer, then it's not really the fault of grep, but of the underlying library calls, setbuf/setvbuf. I don't know of a reliable online reference for the C standard, but e.g. the Linux and FreeBSD man pages along with the POSIX description of setvbuf call it "line buffered". Even the symbolic constant for it is _IOLBF. – ilkkachu Sep 05 '18 at 21:19
  • Well now you've learned better. This buffering strategy is described in the GNU C library doco, albeit briefly. Laurent Bercot is more forthright on the matter. I have mentioned it too. – JdeBP Sep 06 '18 at 00:35
  • I didn’t think “Your expectation is wrong” was a good heading for this excellent explanation of output buffering. I hope you don’t mind that I removed it and added some descriptive headings for each section of the answer. – Anthony Geoghegan Sep 06 '18 at 14:28
  • 2
    @ilkkachu The C standard does indeed use "line buffered". Per 7.21.3 Files, paragraph 3: "When a stream is unbuffered, ... When a stream is fully buffered, ... When a stream is line buffered, characters are intended to be transmitted to or from the host environment as a block when a new-line character is encountered. ..." In fact, the C Standard uses the exact phrase "line buffered" five times. So it's not a misnomer. – Andrew Henle Sep 06 '18 at 14:41
  • 1
    Furthermore, the approach described here as "smart buffering", as I understand it, seems to be just what the C standard describes as "line buffering". Specifically, in addition to flushing the buffer at newlines, "When a stream is line buffered, characters are intended to be transmitted to or from the host environment as a block when [...] input is requested on an unbuffered stream, or when input is requested on a line buffered stream that requires the transmission of characters from the host environment." So this is not a GNU or BSD quirk, but rather what the language calls for. – John Bollinger Sep 06 '18 at 22:44
8

Use

grep --line-buffered

to make grep not buffer more than one line at a time.

choroba
  • 47,233