edit for correction & option clarity - I forgot '--brief'
diff -rs --brief "$dir1" "$dir2"
-r, --recursive recursively compare any subdirectories found
-s, --report-identical-files report when two files are the same
-q, --brief report only when files differ
--speed-large-files assume large files and many scattered small changes
and add other options to taste, depending on what you are comparing:
-i, --ignore-case ignore case differences in file contents
-b, --ignore-space-change ignore changes in the amount of white space
-B, --ignore-blank-lines ignore changes whose lines are all blank
--strip-trailing-cr strip trailing carriage return on input
--ignore-file-name-case ignore case when comparing file names
diff -rs will read every byte of the original and copy,
and report files that are the same.
The diff output format is defined by POSIX, so it is pretty
portable. You may want to add something like:
| tee diff-out.1 | grep -v -Ee 'Files .* and .* are identical'
You could use chksums or hashes, but then you have to keep them
sync'd with the file trees, so you would be back to reading every byte
of every file anyway.
EDIT - too long to be a comment, in response to:
files over 10GB are not verifying
You may want to try this diff option: --speed-large-files
It is possible that the diff you are using is not coping well with
very large files (bigger than system memory, for instance), and is
thus reporting differences between files that are actually the same.
I had thought there was a -h option or a 'bdiff' that did better on
large files, but I cannot find one in Fedora. I believe that the
--speed-large-files options is a successor to a '-h' "half-hearted
compare" option.
A different approach would be to repeat the rsync command you used,
with '-vin' (verbose, itemize, no_run). This would report any
differences that rsync finds - and there should not be any.
To move some files, you're looking at a script something like:
if [ cmp -s "$dir1/$path" "$dir2/$path" ] ; then
target="$dir2/verified/$path"
mkdir -p $(basename "$target")
mv "$dir2/$path" "$target"
fi
but I don't recommend doing that. The underlying question is "how
can I be sure that rsync copied a file hierarchy correctly?"
and if you can demonstrate to yourself that rsync is working
well, with diff or some other tool, then you can just rely
on rsync, rather than working around it.
rsync -vin will compare based on whatever other options you give it.
I thought it defaulted to checksum, but you are right,
-c or --checksum is required for that.
The diff utility is really intended for files of lines of text,
but it should report 'identical' under -s for binary files.
The --brief should suppress any file content output - my apologies
for overlooking it earlier - it was semi-buried in an ugly script.
rsync
copied the data at around 150MB/s, yetdiff
compares at only 60MB/s ... ? – d0g Jan 28 '14 at 06:19rsync
is faster b/crsync
by default does not use checksums to compare files, it looks at size and date info. When you usersync -c
all the files need to have their checksums' calculated which is a burdensome task, hence why it's not the default. – slm Jan 28 '14 at 07:37