What's faster, dd 1.5TB or rsync 500GB?

Question

I need to save data from a failing hard drive.

Sounds like ddrescue or myrescue (or maybe clonezilla?) will be my best friends here, but I'm just wondering what will likely be faster:

using dd/ddrescue/myrescue/clonezilla to simply clone the failing drive to a new drive of identical capacity
using rsync/tar/cp to move files from the failing drive to a new drive

?

dd-ish choices avoid moving data back and forth between kernel-space and user-space, right? But rsync and others avoid moving empty space, right?

Another oddly fortunate bit if I choose a dd-ish solution: the failing drive is currently mounted read-only (part of the failure process, I think) so I guess I don't have to worry about data changing while I'm dd'ing.

This is the root partition, so dd would be handy in that I should be able to boot the new drive after it completes.

score 14 · Accepted Answer · answered Mar 14 '12 at 18:48

14

No question, rsync will be faster. dd will have to read and write the whole 1.5TB and it will hit every bad block, triggering multiple read retries which will further slow an already long process. rsync will only have to read blocks that matter, and since it is unlikely that every bad block occurs in existing files or directories, rsync will encounter fewer of them.

The bad thing about using rsync for disk rescue is that if it does encounter a bad block, it gives up on the file or directory that contains it. If a directory contains a lot of subdirectories and rsync gives up on reading it, then your copy could be missing a lot of what you want to save. The problem is that rsync relies on the filesystem structures to tell it what to copy and the filesystem itself is no longer trustworthy.

For this reason I would first use rsync to copy files off the drive, but I would look very carefully at the output to see what was missed. If you can't live without what rsync failed to copy, then use dd or one of the other low level block copying methods. You can then fsck the copy, mount it and see if you can recover more of your files.

answered Mar 14 '12 at 18:48

Kyle Jones

15,015

Awesome, thanks! For some reason I thought moving data between kernel-space and user-space was expensive, and dd avoids this, but I might be making that up. – Adam Monsen Mar 14 '12 at 19:19
My buddy kormoc tells me: "kernel <-> userspace is just memory copies. It copies way faster then the disk actually responds, so it's almost free. DD still does the kernel -> user -> kernel layer changes as well. dd does not run in kernel space." – Adam Monsen Mar 14 '12 at 20:42
It also makes sense that, if the disk were full (1.5TB of actual non-sparse data), rsync would be slower than dd. – Adam Monsen Mar 14 '12 at 20:48
1

I agree that rsync will be faster if the disk is readable. If there's any change that the disk is partially unreadable and you have no backup of the contained data, I'd rather run ddrescue to make as exact image of the disk as possible. After that, I'd make another copy of the "exact copy", run fsck on that and only then mount it. This way the "exact copy" of your failing disk is kept unmodified for additional rescue operations. – Mikko Rantalainen Jun 29 '12 at 08:08

What's faster, dd 1.5TB or rsync 500GB?

1 Answers1