I want to get both the number of bytes and the sha1sum of a command's output.
In principle, one can always do something like:
BYTES="$( somecommand | wc -c )"
DIGEST="$( somecommand | sha1sum | sed 's/ .*//' )"
...but, for the use-case I am interested in, somecommand
is rather time-consuming, and produces a ton of output, so I'd prefer to call it only once.
One way that comes to mind would be something like
evil() {
{
somecommand | \
tee >( wc -c | sed 's/^/BYTES=/' ) | \
sha1sum | \
sed 's/ .*//; s/^/DIGEST=/'
} 2>&1
}
eval "$( evil )"
...which seems to work, but makes me die a little inside.
I wonder if there is a better (more robust, more general) way to capture the output of different segments of a pipeline into separate variables.
EDIT: The problem I am working on at the moment is in bash
, so I am mostly interested in solutions for this shell, but I do a lot of zsh
programming also, so I have some interest in those solutions as well.
EDIT2: I tried to port Stéphane Chazelas' solution to bash
, but it didn't quite work:
#!/bin/bash
cmd() {
printf -- '%1000s'
}
bytes_and_checksum() {
local IFS
cmd | tee >(sha1sum > $1) | wc -c | read bytes || return
read checksum rest_ignored < $1 || return
}
set -o pipefail
unset bytes checksum
bytes_and_checksum "$(mktemp)"
printf -- 'bytes=%s\n' $bytes
printf -- 'checksum=%s\n' $checksum
When I run the script above, the output I get is
bytes=
checksum=96d89030c1473585f16ec7a52050b410e44dd332
The value of checksum
is correct. I can't figure out why the value of bytes
is not set.
EDIT3: OK, thanks to @muru's tip, I fixed the problem:
#!/bin/bash
cmd() {
printf -- '%1000s'
}
bytes_and_checksum() {
local IFS
read bytes < <( cmd | tee >(sha1sum > $1) | wc -c ) || return
read checksum rest_ignored < $1 || return
}
set -o pipefail
unset bytes checksum
bytes_and_checksum "$(mktemp)"
printf -- 'bytes=%s\n' $bytes
printf -- 'checksum=%s\n' $checksum
Now:
bytes=1000
checksum=96d89030c1473585f16ec7a52050b410e44dd332
UNFORTUNATELY...
...my bytes_and_checksum
function stalls (deadlock?) when cmd
produces a lot more output than was the case in my toy example above.
Back to the drawing board...
somecommand
in a variable, or write it to a temporary file if it's prohibitively large? – Marcus Müller Nov 02 '23 at 15:12eval
? simply call your function. – Marcus Müller Nov 02 '23 at 15:13BYTES
andDIGEST
. – kjo Nov 02 '23 at 15:16cmd | read bytes
to work in bash, you needshopt -s lastpipe
withset +o monitor
– Stéphane Chazelas Nov 04 '23 at 08:52ps
would help to show which processes hang around, andstrace
what they're doing/waiting for. – Stéphane Chazelas Nov 04 '23 at 11:39