2

I have a lot of files in a directory:

$ ls
file000001
file000002
# ... truncated ...
file999999

I am calculating the md5sum of the files like this and finally dumping it to a file:

hashes=''
for file in $(ls); do
  hashes+=$(md5sum $file)
  hashes+="\n"
done

echo "$hashes" > hashes.txt

Now, I would like to press Ctrl + C while the execution of the script is within the for loop and have the contents of hashes dumped to the hashes.txt file. Is this possible?

(Yes, I can append the md5sum to hahes.txt every time the md5sum of each file is calculated but I intend to do it this way (as shown above).)

Note that the example code above is terrible. I actually used md5sum as an example; I am doing some other stuff. The intention of my question is actually to find out how to make Ctrl + C work.

ilkkachu
  • 138,973
GMaster
  • 6,322
  • I understand that this was not the point of the Q, but FYI md5sum does always line buffer its output, so simply md5sum f1 f2 f3 huge-f4 > list will leave you with the sums of f1, f2 and f3 in list if you press ^C while it's processing huge-f4. –  Sep 15 '19 at 14:48

3 Answers3

3
trap ctrl_c INT

ctrl_c() {
    echo "$hashes" > hashes.txt
    exit 0
}

BTW, I would not use ls to get the list of file, but something like for file in *; do.

2

trap break INT before your loop:

hashes=
trap break INT
for file in *; do
  hashes+=$(md5sum -- "$file")$'\n'
done
trap - INT

echo "$hashes" > hashes.txt

I've corrected some dubious stuff in your script ($(ls), "\n").

FWIW, such a "nonlocal break" doesn't work in mksh or yash, but it does in bash, dash, zsh and ksh93.

  • why would that be "safer"? if you have any technical objections, please put up and I'll include them in the answer, and if they're serious enough I'll just delete the whole thing. Notice that this Q is tagged [bash] and my solution was explicitly meant as a simpler alternative to the code duplication and logic meandering that doing stuff in traps (as in the 1st answer) mean. –  Sep 09 '19 at 10:03
  • I meant "safer" in the sense of supporting other shells, I didn't say there was anything wrong with it on Bash. I should have said "more portable", sorry. Also, as I said, I didn't see the other answer (but I also didn't see you refer to that solution) – ilkkachu Sep 09 '19 at 10:17
1

You should never use the output of ls as input to another program. It doesn't work (for any definition of "work" that includes the concepts of reliability or safety), and there is never any need to do so because the shell's globbing (i.e. wildcard expansion) does the job you are trying to do by mis-using ls. See Why not parse ls (and what to do instead)?.

If you must do something similar, use find ... -exec {} + or find ... -print0 | xargs -0r ... rather than ls - either of them will actually work, and work safely no matter what characters are in the filenames.

Your for loop should be written as:

for file in *; do
  ...
done

You don't even need a for loop unless you are going to do something else with each $file within the loop.

Here are some alternative ways to do what your for loop does, by storing the output of md5sum into an array called hashes, and then printing it:

  1. using printf:

    $ hashes=( $( md5sum * ) )
    $ printf '%s  %s\n' "${hashes[@]}" > hashes.txt
    $ cat hashes.txt
    b026324c6904b2a9cb4b88d6d61c81d1  file.1
    26ab0db90d72e28ad0ba1e22ee510510  file.2
    6d7fce9fee471194aa8b5b6e47267f03  file.3
    

    Note that there are two %s separated by two spaces in the printf format string. This will produce the same output as md5sum itself.

    BTW, you can see how the entries are stored in the array with typeset -p:

    $ typeset -p hashes
    declare -a hashes=([0]="b026324c6904b2a9cb4b88d6d61c81d1" [1]="file.1"
    [2]="26ab0db90d72e28ad0ba1e22ee510510" [3]="file.2"
    [4]="6d7fce9fee471194aa8b5b6e47267f03" [5]="file.3")
    

    The shell has split the output of md5sum into words (as defined by the value of the shell's $IFS variable) and put each "word" in a separate element of the array.

  2. using tee:

    $ hashes=( $( md5sum * | tee hashes.txt) )
    $ cat hashes.txt
    b026324c6904b2a9cb4b88d6d61c81d1  file.1
    26ab0db90d72e28ad0ba1e22ee510510  file.2
    6d7fce9fee471194aa8b5b6e47267f03  file.3
    

Knowing this, Lasse Kliemann's trap answer could be written as:

trap ctrl_c INT

ctrl_c() {
    printf '%s  %s\n' "${hashes[@]}" > hashes.txt
    exit 0
}

However, you don't even need a trap here. md5sum with tee will do what you want - i.e. simultaneously populate $hashes AND the hashes.txt file.

Or, if you don't want to use an array:

$ for file in *; do
    hashes+="$(md5sum "$file" | tee -a hashes.txt)"$'\n'
  done

$ echo "$hashes"
b026324c6904b2a9cb4b88d6d61c81d1  file.1
26ab0db90d72e28ad0ba1e22ee510510  file.2
6d7fce9fee471194aa8b5b6e47267f03  file.3

Here we're using tee's -a (--append) option because we're running md5sum in a loop, and don't want each iteration to overwrite the hashes.txt file.

Note: This will always have an extra blank line at the end of the $hashes string, increasing the line count by 1. This will not be the case when using an array.

cas
  • 78,579
  • For an answer that essentially contains just the same solution as another answer (the trap), this one sure has a lot of irrelevant filler. for f in $(ls) is obviously silly, but it's hardly even close to the main point that was being asked about. Using an array or printf doesn't really change things either (at least you didn't show how it would), and tee does exactly what they said they didn't want: fills the output file continuously. – ilkkachu Sep 09 '19 at 09:16
  • did you see the part of the question where they explicitly say they don't want the continuous updating? "Yes, I can append the md5sum to hahes.txt everytime the md5sum of each file is calculated but I intend to do it this way (as shown above)."? As for challenging that, you didn't even say why the continuous update would be better. And yes, I think that's pretty much a requirement for a frame challenge. – ilkkachu Sep 09 '19 at 09:24
  • it's one thing to suggest an alternative solution, but to go off in another direction, ignoring the question that was asked, and claiming "it should be obvious" is just arrogant. Apart from passing that "\n" to echo, they get the same output as you did with their loop, and even that works if they have xpg_echo set. There's also no word splitting of the md5sum output in the original. – ilkkachu Sep 09 '19 at 09:50
  • What's obvious to you, might not be obvious to everyone else. It's not just me reading your answer, there are potentially many others. Not too many yet, it says "viewed 41 times", but that's still a lot more than just me. In some cases arrays are more useful than singular strings, sure, but if all they want to do is collect the outputs of some command invocations, then no, I can't see the advantage of using one. I can't see a "simulation of an array" either, it's not like they're trying to index into that string to extract a particular line or anything like that. – ilkkachu Sep 09 '19 at 10:44
  • Thanks for your detailed explanation. However, I was not looking for an alternate solution. – GMaster Sep 10 '19 at 15:02
  • 2
    no problem. someone else who reads this might gain something useful from it. – cas Sep 10 '19 at 15:11