3

I've been trying to clone a 1Tb HDD to a 1Tb SSD following the instructions here.

I tried cloning it several times with both dd (painfully slow) and cat (far quicker) but when I came to do checksums they didn't match at all.

I used the following code from here sudo sha1sum /dev/sdX

The drive was set up with an unencrypted boot partition, and then three additional partitions encrypted with dmcrypt/luks.

All operations were performed via a LiveCD with both drives unmounted.

On testing (after finding the checksums didn't match) the duplicate drive does appear to function correctly, but I'm suspicious that things may be missing or corrupted.

So my questions are:

What would cause checksums to not match on identical sized drives?

And secondly, would the encryption make a difference, and if so how do you perform checksums on partially encrypted devices?

Mark
  • 31
  • 1
  • 2
  • When cloning a drive, you're almost always better off cloning the files rather than doing a bit-wise copy of the disks. e.g. create an identical partition and filesystem structure on the target drive, and use rsync. when it's finished you'll have to chroot into the target root fs and run grub-install. you'll probably have to change UUIDs etc in the target /etc/fstab, too – cas Oct 08 '15 at 22:53

2 Answers2

3

The problem with running a hash of the disk image is that it's a one-bit measure; it tells you only whether or not the copy is byte-for-byte perfect. And particularly with the disk images, which have the filesystems in it, there's very little reason for them to be byte-for-byte. Even after a straight mirroring, any single error — even an inconsequential one — would break it, as would any change to the disks, including those which come from any partition manipulation, or mounting the filesystems, or...whatever.

More useful would be to mount the filesystems involved, then do something like cd /mnt/mountpoint; find . -type f -exec sha256sum {} \+ >~/checksums. Then you can mount the second disk and run sha256sum -c ~/checksums. This will tell you which files, if any, are altered. (It's very possible that no files were changed, and the change on disk is in FS metadata or partition boundaries or something else not really significant.)

Tom Hunt
  • 10,056
  • 1
    If the whole disk was copied then the copy is byte-for-byte, no matter what the disk contains. But the disks have to have exactly the same size. – Gilles 'SO- stop being evil' Oct 08 '15 at 23:08
  • True. As well, a whole lot of things might change one byte on the disk, including those you wouldn't think would. Mounting any filesystem, of course, but also various block-layer things like LUKS, LVM, RAID metadata. Whether the disk is byte-for-byte identical is probably not the most useful data. – Tom Hunt Oct 08 '15 at 23:24
1

Two inputs have the same cryptographic checksum only if they're identical. Identical inputs must by definition have the same length. Having approximately the same length isn't good enough, they must have exactly the same length.

A “1TB” HDD, in practice, has very close to 10004 = 1,000,000,000,000 bytes. A “1TB” SSD is usually closer to 2 = 10244 = 1,099,511,627,776 bytes. So your SSD is a little larger than your HDD, so the copy overwrote most of the SSD but left a little unused space at the end. When you calculated the checksum of the SSD, you included the whole thing, including the unused space.

You can calculate the checksum of the data on the SSD by checking the size of the hard disk, which fdisk /dev/sdh will tell you (assuming /dev/sdh is the HDD). There's also a size in /proc/partitions but it's in kB, with no indication if the size isn't a multiple of 1kB — I think all hard disks of this size have a size that's multiple of 4kB so this should be ok. Then you can run </dev/sdd head -c 1000196757504 | sha1sum (assuming /dev/sdd is the SSD, and 1000196757504 is the size of the HDD) to compute the checksum of the copy.

But calculating these checksums is not very useful. If there had been an error during the copy, cat would have told you. Comparing the disks can be useful as a sanity check that you've copied what you intended to copy, but mounting the partitions serves the same purpose.

Note that once you've mounted the partitions, the content will be different since mounting records some metadata into the filesystem such as the last mount date. Even a read-only mount may in fact modify the device, in particular to replay the journal on journal filesystems.

  • Comparing checksums is the ultimate test because it gives very high certainty. During copy, there can be bit flips that are not reported, and the more data you copy, the more likely there is a bit flip. ECC memory eliminates some error paths, but not all. I would always only bet on checksums. – Michael F Aug 21 '23 at 12:12