23

I would like to do something like this:

> grep pattern file.txt | size -h
16.4 MB

or something equivalent to:

> grep pattern file.txt > grepped.txt
> ls -h grepped.txt
16.4 MB
> rm grepped.txt

(that would be a bit inconvenient, though)

Is that possible?

ilkkachu
  • 138,973
Raffael
  • 941

4 Answers4

40

You can use wc for this:

grep pattern file.txt | wc -c

will count the number of bytes in the output. You can post-process that to convert large values to “human-readable” format.

You can also use pv to get this information inside a pipe:

grep pattern file.txt | pv -b > output.txt

(this displays the number of bytes processed, in human-readable format).

Stephen Kitt
  • 434,908
  • 2
    I prefer wc -c because du -h returns 4.0 K if it is any smaller than 4,0k as it reads in blocks – Stan Strum Feb 27 '18 at 16:18
  • If printing the output in MB is enough, the command could be | wc -c | sed 's/$/\/1024\/1024/' | bc. This appends /1024/1024 to the output and runs a calculator on the resulting string. – phil294 Dec 05 '19 at 10:30
12

You can use the pipeviewer tool pv with the total byte count flag -b:

$ dd if=/dev/zero bs=3 count=4211 2>/dev/null | pv -b >/dev/null
12.3KiB

$ grep pattern file.txt | pv -b >/dev/null
4

The Pipe Viewer utility was designed for this purpose. If it's not flexible enough for your purposes, then you can implement your own FIFO data transfer measuring code with the pipeline manipulation library (libpipeline) function calls such as pipeline_pump() and pipeline_peek_size().

$ whatis pv
pv (1)               - monitor the progress of data through a pipe
$ pv -Wi 0.002 -cf /etc/hosts | wc -l
 367 B 0:00:00 [2.71MiB/s] 
[============================================================================>] 
100%
10
$
1

One could quickly brew their own solution in Python:

#!/usr/bin/env python
import sys

count = 0
while True:
    byte = sys.stdin.read(1)
    if not byte:
        break
    count =  count + 1

print(count)

Works as so:

$ echo "Hi" | ./count_stdin_bytes.py
3
$ echo "Hello" | ./count_stdin_bytes.py
6
$ dd if=/dev/zero bs=1 count=1024 2>/dev/null |  ./count_stdin_bytes.py 
1024

Since in your particular case you're dealing with text data ( judging from the fact that you pipe from grep), you could also make use of bash's read. Something like this:

$ echo "Hello" | { while read -n 1 char; do ((count++)) ;done ; echo $count; }
6
  • Why is this better than wc -c? while read ... will probably be significantly slower. Also, OP asked for human readable output as in (ls -h) – phil294 Dec 05 '19 at 10:16