9

I want to check the results of a restore from backup. This is, because TimeMachine on MacOS gives me some weird errors and warnings and I want to make sure, everything is in its place again after restoration. While I don't trust TimeMachine to put every file back, I trust it to put every file it restores with the correct content.

I thought about diff -r, but going through roughly 300 GiB may take eternally. I'm fine to compare at least the presence of files but comparing size and date in the same run is even better.

I'm aware of solutions like

diff <(ls -R $PATH1) <(ls -R $PATH2)

but the output is diffish to read. I'd rather like a single line per file found on only one side. Also I have to rely on ls proceeding through the tree in the same order on both sides. This may be different because filesystems may differ.

I'd love most to get a tool for the lazy which takes two pathes and outputs differences up to any desired level of inspection, perhaps something out of macports. But I don't fear ample bashisms.

Cyrus
  • 12,309
Ariser
  • 213

2 Answers2

26

Here is your solution:

rsync -nrv --delete dirA/ dirB/

Instead of making the two folders identical, we use rsync to only show what it would do. That is the effect of -n. Careful, do not forget to add this option!

The -r means a recursive scan, the -v gives the wanted verbose listing. You can add another -v to get all equals listed, too.

The --delete tells rsync to simulate deletion of target files which do not exist in the source. Without the -n flag, the dirB folder would become identical to dirA.

By default, rsync checks only name and timestamp of the files, which is exactly the fast option you were asking for. If you want similar behavior like the diff behavior (and equally slow), you can add a -c flag to enforce checksum comparison.

rsync -nrvc --delete dirA/ dirB/

Note the usage of trailing slashes in dirA/ and dirB/, they are significant in rsync. For further information, study the rsync man page, it makes a lot of sense to get used to this powerful command.

5

The "solution" you mention is a really bad one (it can't deal with weird file names for example) and completely unnecessary. Just use diff directly:

diff -r "$PATH1" "$PATH2"

That will recursively (-r) compare the directories and report whether files are present or missing. For example:

$ tree
.
├── dirA
│   ├── file1
│   └── file2
└── dirB
    └── file{1}

$ diff -qr dirA dirB
Files dirA/file1 and dirB/file1 differ
Only in dirA: file2
Only in dirB: file3

The -q option means "quiet", it will only report whether files differ without printing the differences. In the example above, the files dirA/file1 and dirB/file1 have different contents. This format is about as simple as you can expect and will be pretty fast even for large directories.

terdon
  • 242,166
  • 5
    Won't this take a century, because diff will read the whole files when comparing them? – Ariser Nov 30 '14 at 17:19
  • @Ariser well, it took about 40 seconds for a 2.6G directory (compared to a copy of itself) on my machine. You might be able to get something faster if you just find all files and compare mdsums but that won't make sure that everything is in the right path and will be harder to parse/setup. – terdon Nov 30 '14 at 17:35
  • 2
    Ok, I take it as a "there is no tool to compare files by their outer shape". On my machine it takes approx. 8 hours to verify a volume of 300 GiB with diff. As I do have to do it only in case of restoring backups, I can handle it. It takes me just a couple of days until everything is where it belongs. Perhaps I try --speed-large-files next time. – Ariser Dec 01 '14 at 17:02
  • @Ariser yes there is a tool that does pretty much what you want. It just checked my 60 GB iPhoto library in a minute. Will post an answer, soon. – Christian Tismer Dec 23 '14 at 14:36