I've got the following script:
#!/usr/bin/env bash
# Script to generate MD5 hash for each line.
[ $# -eq 0 ] && { echo "Usage: $0 file"; exit 1; }
file=$1
shopt -s expand_aliases
alias calc_md5='while read -r line; do md5sum <<<$line; done'
paste <(sort "$file" | uniq | calc_md5) <(sort "$file" | uniq)
times
which prints MD5 checksum for each line, side by side, so exactly how I need it. For example:
$ ./md5_lines.sh file.dat
5c2ce561e1e263695dbd267271b86fb8 - line 1
83e7cfc83e3d1f45a48d6a2d32b84d69 - line 2
0f2d633163ca585e5fc47a510e60f1ff - line 3
73bb3632fc91e9d1e1f7f0659da7ec5c - line 4
The problem with above script is that it needs to read and parse the file twice, for each column/stream. Ideally, I'd like to sort and make all lines unique and use it as the input only once.
How can I convert the above script to parse the file only once (sort
& uniq
), then redirect output to two different streams and display lines side-by-side, so it can work quicker for the larger files?
Here is my another attempt:
tee >(calc_md5) >(cat -) \
< <(sort "$file" | uniq) \
>/dev/null
times
but it prints the streams separately (not side-by-side).
Ideally, I'd like to use paste
, the same way as tee
, however it gives me the error:
$ paste >(cat -) >(cat -) </etc/hosts
paste: /dev/fd/63: Permission denied
md5sum
to simplify the script, my original use case was to find partial SHA conflicts by:alias calc_sha="php -r 'while(\$line = fgets(STDIN)){ echo substr(sha1(strtok(\$line, PHP_EOL)), 6, 9) . PHP_EOL; };'"
in large list of IDs (but it's the topic for another question), so ideally I'd like to not invoke separate instances for performance reasons, but work with streams. But any information about dealing with multiple streams is useful. – kenorb Mar 04 '18 at 21:03md5sum -c
in "check" mode for some reason. Although I would still find it more readable and cleaner do define an equivalent function instead of an alias. – David Foerster Mar 04 '18 at 23:19