Our RHEL 7 machines have great long Log files and I asked about buffering of cut in this question. That question remains but a bit of experimentation showed a different issue.
I decided to try using cut by bytes, not by character and discovered the output buffering is different, on one machine but not the other:
On one machine, the two loops:
for ((ii=0;ii<5;ii++)); do date; usleep 500000 ; done | cut -b 1-99
for ((ii=0;ii<5;ii++)); do date; usleep 500000 ; done | cut -c 1-99
(observe the -c
vs -b
for cut
) both display the dates five times, as the loops are progressing.
On the other machine, this loop doing a cut by byte:
for ((ii=0;ii<5;ii++)); do date; usleep 500000 ; done | cut -b 1-99
displays the times as the loop is progressing while this loop:
for ((ii=0;ii<5;ii++)); do date; usleep 500000 ; done | cut -c 1-99
holds the output until the loop is complete. If I set it to run forever, it displays a set of times, every 8192 bytes of output. There are two times per second, as expected but the output is buffered.
Two questions,
- Why is one system different from the other?
- Why is the output buffering different for the two usages of cut?
-b
cuts by byte count, while-c
cuts by character count. In the modern Unicode and UTF-8 and UTF-16 era, not all characters are one byte. – DopeGhoti Oct 27 '22 at 00:26LANG=en_US
and othersLANG=en_US.UTF-8
. If I set the locale to en_US I stop having the problem with cut. – user1683793 Oct 27 '22 at 21:03