Why does ddrescue not use distinct mapfiles for read and write errors? (And how to detect write errors?)

Question

Sometimes, I have to clone a hard drive to another one that seems healthy (SMART values OK), but whose surface could not be checked fully for possible bad sectors.

Typically, if I clone a healthy hard drive, I might use a destination drive that was not previously wiped and, accordingly, not fully checked

Please let me know if I'm wrong: I believe that ddrescue only reports read errors. This also means that ddrescue would indicate a successful cloning without error, even if some sectors could not be copied to the destination drive because of write errors. The same way, the mapfile does not let you know if there are errors on the destination drive.

So, I always wondered why does ddrescue does not let generate two map files (read.log) (write.log), and I assume the short answer could be "This was not implemented yet.".

This leads to a second question: is there a way to detect write errors?

N.B. I assume computing checksum on two 1TB drives after the cloning would take a while. Is there a better solution?

score 7 · Answer 1 · answered May 19 '23 at 12:58

ddrescue doesn’t log write errors because write errors are fatal. Every time it needs to write a block, it keeps track of how many bytes it expects to write, and if that many bytes aren’t written successfully, it aborts with an error message. You can see this by running

$ ddrescue --force /dev/zero /dev/full
GNU ddrescue 1.23
Press Ctrl-C to interrupt
     ipos:        0 B, non-trimmed:        0 B,  current rate:       0 B/s
     opos:        0 B, non-scraped:        0 B,  average rate:       0 B/s
non-tried:    9223 PB,  bad-sector:        0 B,    error rate:       0 B/s
  rescued:        0 B,   bad areas:        0,        run time:          0s
pct rescued:    0.00%, read errors:        0,  remaining time:         n/a
                              time since last successful read:         n/a
Copying non-tried blocks... Pass 1 (forwards)
ddrescue: Write error: No space left on device

If ddrescue completes successfully, then as far as it is aware, all the data that it managed to read was correctly written.

Adding a bit to this: ddrescue is designed for data recovery. It therefore sensibly assumes that the place you’re copying the data to is reliable and not going to randomly fail during the transfer. — Austin Hemmelgarn, May 20 '23 at 13:21

score 5 · Answer 2 · answered May 19 '23 at 12:15

The badblocks program reports read errors, write errors, and corruption errors, see e.g. here.

If you want to make sure your destination disk is good, run badblocks in destructive mode on it. But keep in mind that bad sectors will be remapped on write, as outlined in the other answer.

As to "does ddrescue report write errors", the simplest way is to look into the code, or to experiment and set up something where you have a destination that will cause an error. But my assumption would be that writes just follow standard unix conventions, and will be reported as abnormal errors, like in any other tools.

As to "why", your guess is as good as mine, but one underlying principle of unix is "keep it simple, and separate concerns into different tools".

The focus of ddrescue is getting data off a damaged disk. There are other tools to make sure you have something reliable to write the result to.

Chris Davies · Answer 3 · 2023-05-19T22:39:58.990

Modern disks handle write failures transparently and automatically by remapping the failed sector to one of its spares available specifically for such a situation.

You can see the status in SMART results:

smartctl -a /dev/sda
...
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0

In this example, the last value on the line for ID #5 shows that no sectors have been reallocated. The value for #197 is the number of sectors reporting a read error that have not yet been written and reallocated. The value for #198 is the number of sectors with read/write errors that cannot be reallocated - usually because so many other sectors have failed that there are no more spares available.

It's only in this last situation that a write will be reported to the OS as failed. And ddrescue or any other tool, is likely to fail when the error is in turn reported from the kernel to the application.

Why does ddrescue not use distinct mapfiles for read and write errors? (And how to detect write errors?)

3 Answers3