0

A previous answer to a post mentions to run sha1 hashes on an images of a dd drive clone image.

Another answer to that post suggests mounting the dd image an then compare if the sha1 hashes of "important files" matches.

I want to use the second approach, but instead of manually selecting files I would like to use a bunch of randomly selected files.

Assuming I have two mounted partitions, can I select a bunch of random files and compare the sha1 hashes and stop with an error if a hash is not equal?

Out put should be roughly similar to this, if all goes well:

OK: all 10000 randomly selected files have matching sha1 sums for partitions sda and sdb

Or the output should only be in case of an error and show the filename that has a different sha1 sum on both partitions.

Current code in progress:

#!/bin/bash

N=5
mydir="/home"

dirlisting=`find $mydir |sort -R |tail -$N`
for fname in $dirlisting
do
    echo $fname
done
mrsteve
  • 103
  • 1
    OK, so what's your question? Which part of this is giving you problems? How much have you managed to do? – terdon Sep 27 '14 at 11:31
  • If you already have the file with a filename + hash on each line, you can determine some value that a randon number should be above to get about 1000 entries filtered out. Then check just those. – Anthon Sep 27 '14 at 11:33

2 Answers2

3

As I understand your question you want to find out whether N random files differ between two file system paths. Comparing the files should be faster than calculating checksums of both files. Here is how you can do it:

#!/bin/sh
list1=/tmp/list1
list2=/tmp/list2
shuflist=/tmp/shuflist
n=100000 # How many files to compare.
if test ! -d "$1" -o ! -d "$2"; then
    echo "Usage: $0 path1 path2"
    exit 1
fi
exitcode=0
(cd "$1" && find . -type f >"$list1") || exit 1
(cd "$2" && find . -type f >"$list2") || exit 1
if cmp -s "$list1" "$list2"; then
    shuf -n "$n" "$list1" > "$shuflist"
    while IFS= read -r filename; do
        if ! cmp -s "$1/$filename" "$2/$filename"; then
            echo "Files '$1/$filename' and '$2/$filename' differ."
            exitcode=1
            break
        fi
    done < "$shuflist"
else
    echo File lists differ.
    exitcode=1
fi
rm "$list1" "$list2" "$shuflist"
exit $exitcode

Beware that this script assumes that none of your file names contain a newline character.

nwk
  • 1,009
1

If you want to compare two directories with files and subdirectories, then the diff command can do that for you. By running diff -rq /path/to/fileset1 /path/to/fileset2, it will tell you which files differ and which are where missing for which tree. You could even extend the command give a detailed list of all the changes for auditing purposes with diff -rNau /path/to/fileset1 /path/to/fileset2 for example and redirect the output to a place you want to store the changes.

hspaans
  • 562