18

I want to back up my SSD using the Linux dd command, but I'm not sure how reliable that method will be. I think I read somewhere that dd does not check for or report errors, so obviously if true then it will be a deal breaker.

This will be the command:

sudo dd status=progress bs=512K if=/dev/nvme0n1 of=/media/d/ssd.img

So please explain how reliable the dd command can be for said use case.

And, are there any more reliable and/or easier alternative?

  • 4
    I do not think dd is considered a good backup tool. It also is slow as it copies all empty space also. https://askubuntu.com/questions/2596/comparison-of-backup-tools I prefer just to use rsync, but copy to different devices at different times. https://help.ubuntu.com/community/BackupYourSystem & https://help.ubuntu.com/community/CategoryBackupRecovery – oldfred May 17 '21 at 02:29
  • 11
    I think I read somewhere that dd does not check for or report errors Really!!?? Looks like fake news, about a coreutil piece of software. dd works well. But unless you are trying to build an exact image, I wouldn't call it a backup tool. – Eduardo Trápani May 17 '21 at 03:29
  • I am trying to build an exact image. My goal here is to dd the SSD, nvme secure erase it, and then send the device to the service center. What I want know if I will be able to restore it later with no errors. – user472052 May 17 '21 at 03:58
  • @EduardoTrápani Yes looks like I read the blog wrong. It actually said that dd behaves in aforementioned way when noerror option is given. – user472052 May 17 '21 at 04:01
  • @user472052 Just my two cents on this: your data is probably more valuable to you than your Linux installation. Using an image file presents a bigger risk to your data than copying the files as files, even if restoring your installation might be easier from an image file -- if all goes well. If I were you, I would back up the files from the disk, each partition separately, and take note of the disk partitioning. When the disk is replaced, I would recreate the partitions an restore the files from backup. – Johan Myréen May 17 '21 at 10:40
  • 3
    I think I read somewhere that dd does not check for or report errors. You can verify this is not true from the documentation: As a simple rescue method, call dd as shown in the following example: the operand ‘conv=noerror,sync’ is used to continue after read errors and to pad out bad reads – Bert May 17 '21 at 11:45
  • 1
    Note that it is not uncommon for manufacturers to send back to you a disk with characteristics not identical to the original one. Nowadays that's probably just a different slightly higher number of blocks so it shouldn't be much of an issue, but don't expect the disk to be exactly identical to the original one. IIRC there are a few partitioning or RAID schemes which are based on the end of the volume/partition, though. – jcaron May 17 '21 at 13:58
  • I think so, I prefer working with images (preferred to file-system snapshots), I prefer dd when interfacing steams and block devices together, dd if=/dev/sda bs=10M | sha1sum, and prefer using dd when interacting with block devices (versus reading/writing a stream to a block devices directly sha1sum </dev/sda), there should be no difference about the information copied but there may be difference in the performance of the hardware, this is more obvious on very slow to react storage devices, on ssd's its somewhat mute. a dd rescue variant is helpful when the hardware is degraded/failed – ThorSummoner May 19 '21 at 00:12
  • never trust the software to make a copy on its own, always use a secondary truth like a checksum to verify the copy is intact before destroying what might be an only-good-copy :) (this is one reason I prefer working with disk images to millions of files which each need to be tested in this way) – ThorSummoner May 19 '21 at 00:15

8 Answers8

30

TLDR: Use ddrescue

It supports resume/continue capabilities, has automatic logs, and tons of other options. More at the ddrescue home page.

Example syntax:

ddrescue /dev/sde yourimagename.image sde.log

IF you want to (given your comment mentioning restoring) restore the image from the command above onto another drive of the same exact size:

ddrescue -f yourimagehere.image /dev/sde restore.logfile

Furthermore, it is faster than dd is -- at least it does look like it is when comparing speed of ddrescue and dd + pv.

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
  • Does it work well with NVME drives? My goal here is to create an image of the SSD, nvme secure erase it, and then send the device to the service center. What I want know if I will be able to restore it later with no errors. – user472052 May 17 '21 at 04:02
  • Yes, it should work fine for NVME. ddrescue is used on many different type of drives and filesystem, feel free to check the documentation if you'd like. (that i linked in my post). I updated my post with more info on restoring using the image from the previous output. @user472052 – Nordine Lotfi May 17 '21 at 04:13
  • One more thing: do I need to add the ./ part to refer to the output file? Can I just write disk.img disk.log and omit the ./ part? – user472052 May 17 '21 at 05:35
  • oh, no that was a small error on my part -- you don't need it :) (if you noticed, i didn't use ./ on the second command, so yeah) @user472052 – Nordine Lotfi May 17 '21 at 05:45
  • I surprised to see no mention of clonezilla in the comments and answers. – Darren May 17 '21 at 15:59
  • 8
    Plain dd is fast if you use bs=128k or something, to make read/write system calls in 128kiB blocks instead of the default 512-byte sectors. (About half of your CPUs L2 cache size is a good tradeoff between per-system-call overhead vs. the kernel's copy_to_user / copy_from_user cache hits). Piping it through pv obviously costs more CPU time, although if you run pv to have it attack to an already running process (looking at file-positions in /proc/<PID>/fdinfo) then it avoids that memory bandwidth cost. I assume ddrescue chooses a reasonable block size by default. – Peter Cordes May 17 '21 at 18:05
  • 2
    @PeterCordes OTOH, because of stuff like the -S option you can make ddrescue even faster than plain dd, as well as being able to do some other useful things like pre-allocating the output file or bypassing the OS’s page cache. – Austin Hemmelgarn May 17 '21 at 18:08
  • True, but I mainly mentioned/compared the tool to dd + pv because they both give a similar experience, even if one is better suited (in term of feature set) to be a backup tool in this case... @PeterCordes I didn't think of the trick you mentioned for using pv though, this is pretty nice :) – Nordine Lotfi May 17 '21 at 18:59
  • 2
    If you just want progress stats, GNU dd status=progress shows what it's doing. You don't need pv for that. (And even on any Unix, you could just ls -l the output file whenever you want.) – Peter Cordes May 17 '21 at 19:03
  • That's good to know! I've been using ddrescue for quite some time myself and never knew this flag :D Thanks for mentioning this @AustinHemmelgarn – Nordine Lotfi May 17 '21 at 19:04
  • 1
    @AustinHemmelgarn: GNU dd has conv=sparse. IDK when that feature was added. It doesn't have a preallocate option; you have to use fallocate(1) and dd bs=128k conv=notrunc,sparse. Although I think sparseness detection works in units of the bs, so you might want a smaller block size like 16k. For a device that might have read errors, certainly ddrescue is a very good choice, but other than that you don't need it. If you know its options then by all means use it. Good point about tricks like sparsifying the output, especially if you just used fstrim on your FS before backup. – Peter Cordes May 17 '21 at 19:33
  • @AustinHemmelgarn: Also note that pre-allocate + sparse is less useful than one would like. The unwritten space stays allocated, and even freeing it later with fallocate -d (dig-holes) means the non-zero parts of the file have gaps, making disk / fs read-ahead less effective at getting data from the next non-hole. And the free space not consumed isn't contiguous. One of the other of sparse or prealloc are certainly useful, though, so it's nice to have both features available all in one tool, like how rsync has --preallocate and --sparse options. – Peter Cordes May 18 '21 at 17:40
  • about the speed, dd bs=10M is usually much faster than stock dd (10M is usually NOT the optimal speed for any drive, but it – hanshenrik May 18 '21 at 20:52
  • If you care about speed, ditch dd and go for the file-system specific tools (e2image, ntfsclone,…). You can also backup or replicate the partition table with sfdisk (MBR) or sgdisk (GPT). – Hermann May 19 '21 at 14:36
5

The original comment about 'not check for or report errors' probably came because by default 'dd' does not pad out bad reads, so for block-orientated devices not only the bad block but all subsequent blocks will be incorrect (because they're no longer aligned). As others have said, this is fixable using the 'conv=noerror,sync' option, which tells dd to ensure blocks are remain on block boundaries. It should be joined by a block-size setting matching the filesystem block size, which is often 4096 bytes but can be lower.

I would agree with Johan Myréen's comment about using file backup, because the granularity of the backup is much smaller - an error backing up one file doesn't necessarily affect the others. You could also use a file system that uses error-correction on the file data (such as zfs, btrfs, and some configurations of others), so at the very least you know when errors happen and hopefully can fix them.

Another way to detect bad backups would be to use a message digest hash code, e.g. 'sha256' on the raw device (unmounted!!) and on the dd backup file... they should of course be the same.

Finally, best practice in backups is never to rely on only one backup... keep a minimum of 2!

rivimey
  • 216
1

A disk cloning utility - such as clonezilla will make a compressed copy of your disk, including partition tables etc but omitting unallocated space and also unused space in well-known file systems such as ext4 etc.

Obviously it is much faster to only copy data and not the unused space.

If you suspect that your NVME storage cannot be relied upon for read operations, then perhaps you might want to consider software such as rsync to extract most/all of your data (in addition to a cloning operation).

Jeremy Boden
  • 1,320
0

It is quite OK for whole of disk. You could pipe it through compression, and unused SSD space should compress well. Optimal block sizing helps.

However, compression does not work well with encryption.

Further, copying file system contents will provide a performance benefit and naturally create contiguous files, and while we may not think this matters for SSD, every little bit helps and contiguous free space helps.

.. and yes the performance benefit comes from not copying the space, or even the cluster tips.

ddrescue shines for magnetic media, if your SSD has errors, no amount of retrying is going to help, and there is no platter or moving head to settle from another direction. But ultimately you need to compare entire images (or their hash digest is often considered close enough) to verify.

mckenzm
  • 327
0

dd, as the name suggests, makes a disk image backup. If you have a 1 TB drive with 100 GB used, the disk image will still be 1 TB. The conv=sparse option exists for dd if you know there are zero-blocks, and ddrescue has the --sparse command.

However, for backup purposes, you may not need a raw disk image. Copying files with rsync or putting everything into a (possibly compressed) tarball is sufficient for most backup use-cases, and working on a file level with a filesystem can be nicer than directly working with a mounted image. rsync also has a convenient delta-transfer algorithm which only sends the differences in files for updating backups (not the same as incremental backups, for which you'll need additional software).

qwr
  • 709
  • "As the name suggests"? Out of curiosity: how does the name suggest anything? I've seen the one or the other d explained as "disk", "drive", "data", "dump", "destroyer". How to get from dd to " disk image backup" is a mystery to me. If I didn't know dd, its name would suggest me nothing. – Kamil Maciorowski May 18 '21 at 08:15
  • 1
    It was supposed to be a joke because dd famously has many nicknames but I guess the way I wrote it the joke doesn't come across at all – qwr May 18 '21 at 09:09
  • https://unix.stackexchange.com/questions/6804/what-does-dd-stand-for – Ljm Dullaart May 18 '21 at 09:21
  • The use case of the OP as described in the comments (sigh) is to produce a disk image. – Peter - Reinstate Monica May 18 '21 at 12:00
0

Have a look at partclone (apt-get install partclone), I think it is what you are looking for.

Jon
  • 1
0

Yes, dd is fine but not really necessary.

I just use "cat" plus pbzip2 to create compressed images, e.g.:

# cat /dev/nvme0n1 | pv | pbzip2 > myimage.img.bz2
Mark
  • 1
0

This hypocritic answer should have probably just been a comment.

Use rsync. It's the tool for the job.

Vorac
  • 3,077