Recreate folder structure using md5 hashes

Question

Given a long list of md5 checksums and copies of these files, renamed and in different folder structures: How can I recreate or recover the original filesystem structure? Assuming no hash collisions.

be70e389a9e000a85826a1a80488e1e1  path/A/2/2.bin
96a48d4706ec8eafff7e56f6784bb6b4  path/B/b1.bin
ffd2e58da118ba6c85de29c4c5b4c1f8  path/C/c1.bin
dbde0b664f88d8027e5cb7efb2cd1060  path/C/2/c2.bin
...```

Just in case this was what you were thinking, I feel obligated to point you in this direction. — KGIII, Nov 27 '20 at 22:54

score 1 · Answer 1 · answered Nov 27 '20 at 22:06

With bash I would:

Iterate over the file with read and store each hash into an associative array
Store off all local file names into a temporary file (using find should be fine).
Iterate over the list of local files running md5sum on each, checking if the hash is in the array as a key, and if so renaming it to the target name.

score 0 · Answer 2 · answered Nov 28 '20 at 10:24

I ended up using join for a quick and dirty solution, assuming that the filenames used for restoring the folders have no blank characters:

md5sum * | sort -u -k 1,1 | join - ../restore.s | \
  while read h r t; do \
    mkdir -p $(dirname "tmp/$t"); cp "$r" "tmp/$t"; \
  done

The inputs for join needed sorting and I removed identical files. The output of the join is then lines of hash source target used to restore the files.

Recreate folder structure using md5 hashes

2 Answers2