5

I would like to track progress of a slow operation using pv. The size of the input of this operation is known in advance, but the size of its output is not. This forced me to put pv to the left of the operation in the pipe.

The problem is that the long-running command immediately consumes its whole input because of buffering. This is somewhat similar to the Turn off buffering in pipe question, but in my case it is the consuming operation that is slow, not the producing one and none of the answers to the other question seem to work in this case.

Here is a simple example demonstrating the problem:

seq 20 | pv -l -s 20 | while read line; do sleep 1; done
  20 0:00:00 [13.8k/s] [=====================================>] 100%

Instead of getting updated every second, the progress bar immediately jumps to 100% and stays there for the entire 20 seconds it takes to process the input. pv could only measure the progress if the lines were processed one by one, but the entire input of the last command seems to be read into a buffer.

A somewhat longer example that also demonstrates the unknown number of output lines:

#! /bin/bash
limit=10
seq 20 | \
  pv -l -s 20 | \
  while read num
do
  sleep 1
  if [ $num -gt $limit ]
  then
    echo $num
  fi
done

Any suggestions for a workaround? Thanks!

Zoltan
  • 476
  • It's certainly not the shell's read that buffers the data. POSIX shells' reads read 1 byte at a time, as you can verify with strace. – Petr Skocik Oct 22 '16 at 17:08
  • The cause may be something else indeed, as seq 20 | pv -ls 20 | pv -qL 10 shows the same behavior. – Zoltan Oct 22 '16 at 17:35
  • 1
    It is not possible to put pv to the left of the processing you are interested in. What's happening here is that seq 20 immediately outputs the entire sequence, and pv dutifully reads the whole thing and copies it to stdout, which does not block because pipes are buffered. – Kevin Oct 22 '16 at 17:50

1 Answers1

8

In your setup the data has passed pv while it is still processed on the right side. You could try to move pv to the rightmost side like this:

seq 20 | while read line; do sleep 1; echo ${line}; done | pv -l -s 20 > /dev/null

Update: Regarding your update, maybe the easiest solution is to use a named pipe and a subshell to monitor the progress:

#! /bin/bash
trap "trap - SIGTERM && kill -- -$$" SIGINT SIGTERM EXIT
(rm /tmp/progress.pipe; mkfifo /tmp/progress.pipe; tail -f /tmp/progress.pipe | pv -l -s 20 > /dev/null)&
limit=10
seq 20 | \
  while read num
do
  sleep 1
  if [ $num -gt $limit ]
  then
    echo $num
  fi
  echo $num > /tmp/progress.pipe
done
FloHimself
  • 11,492
  • Thanks, that's a good suggestion, unfortunately the processing that I substituted with sleep 1 also involves printing some lines and the number of those is not known in advance. I updated my question with this detail. – Zoltan Oct 22 '16 at 17:03
  • 1
    Thanks, this is exactly what I was looking for. I also tried using mkfifo, but without tail -f it didn't work, that seems to be the critical bit that I missed. – Zoltan Oct 22 '16 at 19:34
  • Please note that this still needs some minor tweaking as in its current form it leaves behind a pv process running in the background. This should be easy to take care of so I will handle this on my own, I'm just mentioning it to make other users of your answer aware. – Zoltan Oct 22 '16 at 19:54
  • @Zoltan you are right. I've updated the script to clean up the childs on exit. – FloHimself Oct 22 '16 at 20:21