How to measure size of piped data?

Question

I would like to do something like this:

> grep pattern file.txt | size -h
16.4 MB

or something equivalent to:

> grep pattern file.txt > grepped.txt
> ls -h grepped.txt
16.4 MB
> rm grepped.txt

(that would be a bit inconvenient, though)

Is that possible?

Stephen Kitt · Accepted Answer · 2018-02-26T13:43:55.067

40

You can use wc for this:

grep pattern file.txt | wc -c

will count the number of bytes in the output. You can post-process that to convert large values to “human-readable” format.

You can also use pv to get this information inside a pipe:

grep pattern file.txt | pv -b > output.txt

(this displays the number of bytes processed, in human-readable format).

edited Feb 26 '18 at 13:43

answered Feb 26 '18 at 13:37

Stephen Kitt

434,908

2

I prefer wc -c because du -h returns 4.0 K if it is any smaller than 4,0k as it reads in blocks – Stan Strum Feb 27 '18 at 16:18
If printing the output in MB is enough, the command could be | wc -c | sed 's/$/\/1024\/1024/' | bc. This appends /1024/1024 to the output and runs a calculator on the resulting string. – phil294 Dec 05 '19 at 10:30

score 12 · Answer 2 · answered Feb 26 '18 at 13:44

12

You can use the pipeviewer tool pv with the total byte count flag -b:

$ dd if=/dev/zero bs=3 count=4211 2>/dev/null | pv -b >/dev/null
12.3KiB

$ grep pattern file.txt | pv -b >/dev/null

answered Feb 26 '18 at 13:44

Bjarke Freund-Hansen

1,093

score 4 · Answer 3 · answered Feb 26 '18 at 14:05

The Pipe Viewer utility was designed for this purpose. If it's not flexible enough for your purposes, then you can implement your own FIFO data transfer measuring code with the pipeline manipulation library (libpipeline) function calls such as pipeline_pump() and pipeline_peek_size().

$ whatis pv
pv (1)               - monitor the progress of data through a pipe
$ pv -Wi 0.002 -cf /etc/hosts | wc -l
 367 B 0:00:00 [2.71MiB/s] 
[============================================================================>] 
100%
10
$

used it like this $raspivid -n -o - -t 0 | pv -Wi 0.002 > /dev/null — GO.exe, May 11 '23 at 09:33

score 1 · Answer 4 · answered Feb 27 '18 at 09:16

One could quickly brew their own solution in Python:

#!/usr/bin/env python
import sys

count = 0
while True:
    byte = sys.stdin.read(1)
    if not byte:
        break
    count =  count + 1

print(count)

Works as so:

$ echo "Hi" | ./count_stdin_bytes.py
3
$ echo "Hello" | ./count_stdin_bytes.py
6
$ dd if=/dev/zero bs=1 count=1024 2>/dev/null |  ./count_stdin_bytes.py 
1024

Since in your particular case you're dealing with text data ( judging from the fact that you pipe from grep), you could also make use of bash's read. Something like this:

$ echo "Hello" | { while read -n 1 char; do ((count++)) ;done ; echo $count; }
6

Why is this better than wc -c? while read ... will probably be significantly slower. Also, OP asked for human readable output as in (ls -h) — phil294, Dec 05 '19 at 10:16

How to measure size of piped data?

4 Answers4