540

I have a script which calls two commands:

long_running_command | print_progress

The long_running_command prints progress but I'm unhappy with it. I'm using print_progress to make it nicer (namely, I print the progress in a single line).

The problem: Connection a pipe to stdout also activates a 4K buffer, so the nice print program gets nothing ... nothing ... nothing ... a whole lot ... :)

How can I disable the 4K buffer for the long_running_command (no, I do not have the source)?

  • 2
    So when you run long_running_command without piping you can see the progress updates properly, but when piping they get buffered? –  Jun 16 '09 at 10:58
  • 2
    Yes, that's exactly what happens. – Aaron Digulla Jun 16 '09 at 11:50
  • 39
    The inability for a simple way of controlling buffering has been a problem for decades. For example, see: http://marc.info/?l=glibc-bug&m=98313957306297&w=4 which basicly says "I can't be arsed doing this and here's some clap-trap to justify my position" –  Oct 19 '10 at 21:59
  • 2
    http://serverfault.com/a/589614/67097 – Nakilon Feb 09 '15 at 09:08
  • 10
    It is actually stdio not the pipe that causes a delay while waiting for enough data. Pipes do have a capacity, but as soon as there is any data written to the pipe, it is immediately ready to read at the other end. – Sam Watkins Dec 16 '16 at 11:37

15 Answers15

574

Another way to skin this cat is to use the stdbuf program, which is part of the GNU Coreutils (FreeBSD also has its own one).

stdbuf -i0 -o0 -e0 command

This turns off buffering completely for input, output and error. For some applications, line buffering may be more suitable for performance reasons:

stdbuf -oL -eL command

Note that it only works for stdio buffering (printf(), fputs()...) for dynamically linked applications, and only if that application doesn't otherwise adjust the buffering of its standard streams by itself, though that should cover most applications.

a3nm
  • 9,207
  • 9
    "unbuffer" needs to be installed in Ubuntu, which is inside the package: expect-dev which is 2MB... – lepe Jun 27 '13 at 06:21
  • 2
    This works great on the default raspbian install to unbuffer logging. I found sudo stdbuff … command works although stdbuff … sudo command didn't. – natevw Jul 10 '13 at 06:05
  • 32
    @qdii stdbuf does not work with tee, because tee overwrites the defaults set by stdbuf. See the manual page of stdbuf. – ceving Jun 30 '14 at 11:51
  • 5
    @lepe Bizarrely, unbuffer has dependencies on x11 and tcl/tk, meaning it actually needs >80 MB if you're installing it on a server without them. – lambshaanxy Aug 28 '14 at 12:27
  • 1
    Excellent! Wish I could upvote more. I had a background process that was not writing to a log ... even after being terminated, it wouldn't write the buffer out. stdbuf worked like a charm! – Rado Feb 06 '15 at 17:41
  • @lepe, not that bizarre, since it's part of Expect, thus written in TCL, and TCL and Tk are often packaged together... and Tk has that whole, fun chain. – Charles Duffy May 07 '15 at 17:21
  • @CharlesDuffy - indeed TCL/Tk are seemingly inseparable. jim is usually more preferable where TCL is concerned, from where I sit. It doesn't pack some ginormous dot-net style dependency chain in for a simple scripting language. – mikeserv Jun 15 '15 at 21:53
  • 1
    FWIW, I can't get this to work at all. I'm trying to pipe a long-running command to ts (timestamper), and I can't find any variant that works, even with sudo on the inside. Example: sudo stdbuf -o0 -e0 -i0 /usr/local/bin/ec2-snapshot-all | stdbuf -o0 -e0 -i0 ts (and moving the sudo dosen't help).

    In contrast, the socat solution works perfectly.

    – rlpowell Jul 23 '15 at 23:57
  • @rlpowell, setting the value to 0 will make it unbuffered. You need to set it to a value higher then 0 or if you want, set it to L and it will be line buffered. Type stdbuf --help for all the info you need. – Adrian Sep 05 '15 at 14:01
  • @Adrian Yes, unbuffered is what I want, but it stays buffered in nice, big multiline chunks. – rlpowell Sep 06 '15 at 17:24
  • 16
    @qdii stdbuf uses LD_PRELOAD mechanism to insert its own dynamically loaded library libstdbuf.so. This means that it will not work with these kinds executables: with setuid or file capabilities set, statically linked, not using standard libc. In these cases it is better to use the solutions with unbuffer / script / socat. See also stdbuf with setuid/capabilities. – pabouk - Ukraine stay strong Oct 12 '15 at 09:20
  • Does stdbuf have hybrid line + block buffering? That is output when a newline is encountered, or when the buffer reaches a high watermark? – CMCDragonkai Apr 24 '16 at 16:58
  • when piping commands do I need to prefix each segment with stdbuf? – jchook Jul 20 '16 at 15:28
  • 1
    This didn't work for me, piping mycommand | tee /dev/tty | awk 'awkstuff' -- I tried putting stdbuf -o0 -e0 -i0 in front of each and all three of these commands. What ended up working was using awk (actually mawk) as awk -Winteractive. – brandones Aug 22 '18 at 21:46
  • @brandones: this is probably because of an internal buffer of mawk on which stdbuf cannot act. See also https://www.perkin.org.uk/posts/how-to-fix-stdio-buffering.html – a3nm Aug 22 '18 at 22:44
  • 3
    @jchook Yes, what was said in the accepted answer using unbuffer above also applies here: "for longer pipelines, you may have to unbuffer each command" – shaneb Dec 04 '18 at 15:18
  • 1
    Is there a way to force the pipe/printf buffers externally for already running process with known PID? – mvorisek Jun 09 '19 at 22:31
  • @Mvorisek: I don't know, I guess this could be asked as a separate question. – a3nm Jun 09 '19 at 22:53
  • Note that this will not work if the command overrides the buffer settings (e.g. python commands). See https://stackoverflow.com/questions/55654364/why-stdbuf-has-no-effect-on-python – Rufus Apr 26 '21 at 07:23
  • @pabouk-Ukrainestaystrong How can I verify stdbuf whether for a specific command or not,say adb. – John Jun 01 '22 at 12:31
  • 1
    @John What do you mean by "verify stdbuf"? You can just try to use it. If you want to go the hard way to check it before trying, you can verify if the binary calls the IO stream functions https://www.gnu.org/software/libc/manual/html_node/I_002fO-on-Streams.html To do this you can run ltrace -x '*@libc.so*' your_binary and analyze the output :) – pabouk - Ukraine stay strong Jun 01 '22 at 16:30
  • @pabouk-Ukrainestaystrong Sorry for my poor English. I mean, how can I verify stdbuf stdbuf -i0 -o0 -e0 whether works for a specific command or not, say adb. – John Jun 02 '22 at 01:47
  • @John There is nothing new I can add. Just try to use it and check if it has the effect you expect. If it behaves differently, you can check the library calls using ltrace but this needs a knowledge and time. – pabouk - Ukraine stay strong Jun 02 '22 at 07:44
338

You can use the unbuffer command (which comes as part of the expect package), e.g.

unbuffer long_running_command | print_progress

unbuffer connects to long_running_command via a pseudoterminal (pty), which makes the system treat it as an interactive process, therefore not using the 4-kiB buffering in the pipeline that is the likely cause of the delay.

For longer pipelines, you may have to unbuffer each command (except the final one), e.g.

unbuffer x | unbuffer -p y | z
Stephen Kitt
  • 434,908
  • 4
    In fact, the use of a pty to connect to interactive processes is true of expect in general. –  Jun 17 '09 at 07:58
  • 24
    When pipelining calls to unbuffer, you should use the -p argument so that unbuffer reads from stdin. –  Oct 06 '09 at 20:18
  • 34
    Note: On debian systems, this is called expect_unbuffer and is in the expect-dev package, not the expect package – bdonlan Jan 24 '11 at 11:14
  • When I type Ctrl+Z to suspend the program, unbuffer doesn't pass the word along, so the program continues to use CPU. I'm on Ubuntu 10.04. – Joey Adams Jan 26 '13 at 23:33
  • 5
    @bdonlan: At least on Ubuntu (debian-based), expect-dev provides both unbuffer and expect_unbuffer (the former is a symlink to the latter). The links are available since expect 5.44.1.14-1 (2009). – jfs Apr 11 '13 at 13:00
  • 3
    unbuffer is in the main expect package on debian now (it's still a symlink to expect_unbuffer, which is also in the main expect package) – cas Nov 04 '15 at 23:50
  • 1
    Note: On Ubuntu 14.04.x systems, it's also in the expect-dev package. – Alexandre Mazel Dec 11 '15 at 16:08
  • Iʼm getting weird errors with unbuffer. A minimum reproducible example for me is unbuffer cat /dev/urandom > /dev/null; this causes a segfault within a couple seconds. This does not happen with /dev/zero or without unbuffer, making me think expect is trying to do something with the output. Does that seem plausible? – Daniel H Jun 30 '20 at 09:14
  • This is not working for me when trying to unbuffer ffmpeg – Michael May 12 '22 at 17:57
106

For grep, sed and awk you can force output to be line buffered. You can use:

grep --line-buffered

Force output to be line buffered.  By default, output is line buffered when standard output is a terminal and block buffered other-wise.

sed -u

Make output line buffered.

See this page for more information: http://www.perkin.org.uk/posts/how-to-fix-stdio-buffering.html

Braiam
  • 35,991
yaneku
  • 1,069
  • 1
  • 7
  • 2
  • 11
    Notably python also supports the -u parameter to disable buffering. – David Parks Dec 19 '19 at 21:52
  • 1
    Using grep(etc.) like this won't work. By the time you've executed long_running_command it's too late. It'll be buffered before it even gets to grep. – tgm1024--Monica was mistreated Jan 24 '20 at 15:26
  • This is still buffered. what if one wants to see the line progress... – Michael Feb 03 '20 at 18:48
  • Should this work with grep --line-buffered pattern *many*many*files* | head? It looks like grep processes all the files before feeding the output lines to head – golimar Feb 28 '20 at 11:27
  • And here I thought I was a grep power user. – BaseZen Jan 02 '21 at 03:31
  • Ah, fantastic! Here is me watching IOPS on a ZFS resilver: zpool iostat -vH tank 1 | grep --line-buffered ata-TOSHIBA_HDWG460_NVYXDA0GMFR1H | awk '{print $5;fflush()}' | average . Notice the fflush() in awk to get it to also unbuffer. – Bill McGonigle Mar 16 '23 at 21:23
105

Yet another way to turn on line-buffering output mode for the long_running_command is to use the script command that runs your long_running_command in a pseudo terminal (pty).

script -q /dev/null long_running_command | print_progress      # (FreeBSD, Mac OS X)
script -q -c "long_running_command" /dev/null | print_progress # (Linux)
ste
  • 115
chad
  • 1,051
  • 20
    +1 nice trick, since script is such an old command, it should be available on all Unix-like platforms. – Aaron Digulla Jan 20 '13 at 13:01
  • 6
    you also need -q on Linux: script -q -c 'long_running_command' /dev/null | print_progress – jfs Apr 11 '13 at 12:51
  • 1
    It seems like script reads from stdin, which makes it impossible to run such a long_running_command in the background, at least when started from interactive terminal. To workaround, I was able to redirect stdin from /dev/null, since my long_running_command doesn't use stdin. – haridsv Nov 15 '13 at 12:44
  • 1
    Even works on Android. – not2qubit Jul 02 '14 at 23:36
  • 1
    I can confirm that this also works on Mac – Umur Kontacı May 17 '15 at 23:22
  • 3
    One significant disadvantage: ctrl-z no longer works (i.e. I can't suspend the script). This can be fixed by, for example: echo | sudo script -c /usr/local/bin/ec2-snapshot-all /dev/null | ts , if you don't mind not being able to interact with the program. – rlpowell Jul 24 '15 at 00:03
  • None of the other commands exist in Busybox, this one does, but when I tried it, it didn't work. Could just be my situation though, I was doing something weird with file descriptors. – CMCDragonkai Feb 20 '16 at 08:35
  • 1
    Great. This works to make the control characters work with script -q -c 'python -c "import pdb, sys; pdb.set_trace()"' /dev/null | tee -a /tmp/tmp.txt. – blueyed Sep 01 '16 at 10:40
  • Note that this doesn't work with the script command in SmartOS/Solaris derivatives – Ed L Jan 12 '17 at 20:34
  • 1
    A disadvantage that I found was that script will mask the command return status with its own (usually 0). If the print_progress or the rest of the compound command depends on it (by using || or && constructs), it'll not work as expected. – milton Oct 08 '17 at 23:40
  • 1
    Be very careful if the script itself is in the middle of a pipe, I did cat a | script python b.py | sink and it overwrote b.py. Had I not had the file open in vim I'd be out an hour of work. – user1055947 Apr 19 '18 at 17:56
  • 3
    Using script worked for me where stdbuf did not. Use script -e -c <cmd> /dev/null if you want script to return the exit code of <cmd>. – ntc2 Dec 24 '18 at 18:09
59

If it is a problem with the libc modifying its buffering / flushing when output does not go to a terminal, you should try socat. You can create a bidirectional stream between almost any kind of I/O mechanism. One of those is a forked program speaking to a pseudo tty.

 socat EXEC:long_running_command,pty,ctty STDIO 

What it does is

  • create a pseudo tty
  • fork long_running_command with the slave side of the pty as stdin/stdout
  • establish a bidirectional stream between the master side of the pty and the second address (here it is STDIO)

If this gives you the same output as long_running_command, then you can continue with a pipe.

Edit : Wow Did not see the unbuffer answer ! Well, socat is a great tool anyway, so I might just leave this answer

shodanex
  • 701
25

You can use

long_running_command 1>&2 |& print_progress

The problem is that libc will line-buffer when stdout to screen and block-buffer when stdout to a file, but no-buffer for stderr.

I don't think it's the problem with pipe buffer, it's all about libc's buffer policy.

forest
  • 2,655
Wang HongQin
  • 407
  • 4
  • 4
  • You're right; my question is still: How can I influence libc's buffer policy without recompiling? – Aaron Digulla Apr 04 '14 at 08:55
  • @StéphaneChazelas fd1 will redirected to stderr – Wang HongQin Aug 07 '15 at 09:26
  • @StéphaneChazelas i dont get your arguing point. plz do a test, it works – Wang HongQin Aug 07 '15 at 09:53
  • 6
    OK, what's happening is that with both zsh (where |& comes from adapted from csh) and bash, when you do cmd1 >&2 |& cmd2, both fd 1 and 2 are connected to the outer stdout. So it works at preventing buffering when that outer stdout is a terminal, but only because the output doesn't go through the pipe (so print_progress prints nothing). So it's the same as long_running_command & print_progress (except that print_progress stdin is a pipe that has no writer). You can verify with ls -l /proc/self/fd >&2 |& cat compared to ls -l /proc/self/fd |& cat. – Stéphane Chazelas Aug 07 '15 at 10:44
  • 6
    That's because |& is short for 2>&1 |, literally. So cmd1 |& cmd2 is cmd1 1>&2 2>&1 | cmd2. So, both fd 1 and 2 end up connected to the original stderr, and nothing is left writing to the pipe. (s/outer stdout/outer stderr/g in my previous comment). – Stéphane Chazelas Aug 07 '15 at 10:48
  • This does not solve the original problem but helps when using tee as the print_progress function. With this solution you get console output immediately but tee output remains buffered. – Dima Chubarov Jun 22 '20 at 12:16
11

It used to be the case, and probably still is the case, that when standard output is written to a terminal, it is line buffered by default - when a newline is written, the line is written to the terminal. When standard output is sent to a pipe, it is fully buffered - so the data is only sent to the next process in the pipeline when the standard I/O buffer is filled.

That's the source of the trouble. I'm not sure whether there is much you can do to fix it without modifying the program writing into the pipe. You could use the setvbuf() function with the _IOLBF flag to unconditionally put stdout into line buffered mode. But I don't see an easy way to enforce that on a program. Or the program can do fflush() at appropriate points (after each line of output), but the same comment applies.

I suppose that if you replaced the pipe with a pseudo-terminal, then the standard I/O library would think the output was a terminal (because it is a type of terminal) and would line buffer automatically. That is a complex way of dealing with things, though.

  • 1
    Actually, it's an easy way of dealing with things when, as the question says, altering the program code is not an option. https://unix.stackexchange.com/a/215071/5132 – JdeBP Jan 17 '20 at 19:11
8

I know this is an old question and already had lot of answers, but if you wish to avoid the buffer problem, just try something like:

stdbuf -oL tail -f /var/log/messages | tee -a /home/your_user_here/logs.txt

This will output in real time the logs and also save them into the logs.txt file and the buffer will no longer affect the tail -f command.

Alois Mahdal
  • 4,440
Marin N.
  • 182
5

I don't think the problem is with the pipe. It sounds like your long running process is not flushing its own buffer frequently enough. Changing the pipe's buffer size would be a hack to get round it, but I don't think its possible without rebuilding the kernel - something you wouldn't want to do as a hack, as it probably aversley affect a lot of other processes.

  • 20
    The root cause is that libc switches to 4k buffering if the stdout is not a tty. – Aaron Digulla Jun 16 '09 at 11:50
  • 5
    That is very interesting ! because pipe don't cause any buffering. They provide buffering, but if you read from a pipe, you get whatever data is available, you don't have to wait for a buffer in the pipe. So the culprit would be the stdio buffering in the application. –  Jun 16 '09 at 13:58
3

In a similar vein to chad's answer, you can write a little script like this:

# save as ~/bin/scriptee, or so
script -q /dev/null sh -c 'exec cat > /dev/null'

Then use this scriptee command as a replacement for tee.

my-long-running-command | scriptee

Alas, I can't seem to get a version like that to work perfectly in Linux, so seems limited to BSD-style unixes.

On Linux, this is close, but you don't get your prompt back when it finishes (until you press enter, etc)...

script -q -c 'cat > /proc/self/fd/1' /dev/null
jwd
  • 1,467
  • Why does that work? Does "script" turn off buffering? – Aaron Digulla Dec 06 '16 at 10:09
  • @Aaron Digulla: script emulates a terminal, so yes, I believe it turns off buffering. It also echoes back each character sent to it - which is why cat is sent to /dev/null in the example. As far as the program running inside script is concerned, it is talking to an interactive session. I believe it's similar to expect in this regard, but script likely is part of your base system. – jwd Dec 07 '16 at 18:54
  • The reason I use tee is to send a copy of the stream to a file. Where does the file get specified to scriptee? – Bruno Bronosky Jul 31 '19 at 19:26
  • @BrunoBronosky: You are right, it is a bad name for this program. It is not really doing a 'tee' operation. It is just disabling buffering of output, per the original question. Maybe it should be called "scriptcat" (though it's not doing concatenation either...). Regardless, you can replace the cat command with tee myfile.txt, and you should get the effect you want. – jwd Jul 31 '19 at 21:51
2

jq has --unbuffered flag:

Flush the output after each JSON object is printed (useful if you're piping a slow data source into jq and piping jq's output elsewhere).

  • A lot of tools have options like this but my question is really about how to fix that for any tool. I guess I'll have to file a feature request to GLIBC. – Aaron Digulla Jun 19 '20 at 09:12
  • 1
    I don't see where OP mentions using jq. – amphetamachine Feb 17 '22 at 16:04
  • 1
    Well... the other commenters aren't wrong, but I'm glad you mentioned jq's --unbuffered flag. I've just been trying to pipe the output of mosquitto_sub to jq in a Git Bash terminal. I'm troubleshooting a hardware problem, and I need a 'live' readout of a few values. The messages that get splatted to the terminal are too big to see clearly, so I wanted to filter them using jq, got bit by this exact same buffering behavior. Using jq --unbuffered solved it neatly, thank you! – evadeflow Nov 14 '23 at 03:12
2

Python has the -u (unbuffered) flag.

$ man python3
[...]
       -u     Force the stdout and stderr streams to be unbuffered.  This option has no effect on the stdin stream.
[...]
1

I found this clever solution: (echo -e "cmd 1\ncmd 2" && cat) | ./shell_executable

This does the trick. cat will read additional input (until EOF) and pass that to the pipe after the echo has put its arguments into the input stream of shell_executable.

  • 3
    Actually, cat doesn't see the the output of the echo; you just run two commands in a subshell and the output of both is sent into the pipe. The second command in the subshell ('cat') reads from the parent/outer stdin, that's why it works. – Aaron Digulla Nov 09 '16 at 11:19
0

According to this post here, you could try reducing the pipe ulimit to one single 512-byte block. It certainly won't turn off buffering, but well, 512 bytes is way less than 4K :3

RAKK
  • 1,352
-3

According to this the pipe buffer size seems to be set in the kernel and would require you to recompile your kernel to alter.

second
  • 111