First, a few shell matters:
- Don't use
for fname in `find …`
as this will mangle file names and will fail (because the command line is too long) if there are too many files with too long names. Use find -exec
instead. Since you need shell expansion in the command executed by find
, invoke a shell.
- You need double quotes around command substitutions as well as variable substitutions (
"$fname"
, "$(echo …)"
).
echo
mangles backslashes on a few shells (it also mangles a few arguments beginning with -
, but that's not an issue here since all arguments will begin with ./
). A way to print any string literally is printf "%s\n" "$fname"
, or printf "%s"
"$fname" to avoid a final newline. Here I see no reason to take the hash of the filename plus a final newline as opposed to the hash of the filename.
So we get this command:
find . -type f -exec sh -c 'mv "$0" "$(printf "%s" "$0" | sha1sum | cut -f1 -d" ").html' {} \;
It will be slightly faster to invoke a shell for a whole batch of names at once.
find . -type f -exec sh -c 'for fname; do mv "$fname" "$(printf "%s" "$fname" | sha1sum | cut -f1 -d" ").html; done' _ {} +
A problem with this method is that if mv
starts to act before find
has finished traversing the directory, files that have been moved may be picked up by mv
. This is not an issue with your command because it waits for find
to finish before starting moving files. So put the renamed files in a different directory hierarchy. This will solve another problem which your proposed command also has, which is that mv
may overwrite an existing file that happens to be called <sha1sum>.html
.
mkdir ../staging
find . -type f -exec sh -c 'for fname; do mv "$fname" ../staging/"$(printf "%s" "$fname" | sha1sum | cut -f1 -d" ").html; done' _ {} +
find . -depth \! -name "." -type d -exec rmdir {} +
mv ../staging/* .
Now on to your main question: two files with different paths will map to two different SHA-1 hashes. Mathematically speaking, there exist distinct strings with identical SHA-1 hashes (that's obvious since there are infinitely many strings but only finitely many hashes). However, practically speaking, no one knows how to find them: there is no known collision for SHA-1. It is possible that one day in the future SHA-1 will be broken, in which case your procedure will be safe only against accidental collisions, not against malicious attackers. If that happens (not any time soon), you should upgrade to whatever is considered a secure hash algorithm at the time.
As for your second question: the hash is fully determined by the string you hash. So if you have two files called tweedledum/staple
and tweedledee/staple
and you run that renaming procedure from each directory tweedledee
and tweedledum
in turn, then both directories will end up with a file called 1c0ee9c1eed005a476403c7651b739ae5bc7cf2a.html
. If you want to have different names, you need to put some distinguishing content in the hashed text, such as the name of the directory.