12

An external 3½" HDD seems to be in danger of failing — it's making ticking sounds when idle.

I've acquired a replacement drive, and want to know the best strategy to get the data off of the dubious drive with the best chance of saving as much as possible.

There are some directories that are more important than others. However, I'm guessing that picking and choosing directories is going to reduce my chances of saving the whole thing. I would also have to mount it, dump a file listing, and then unmount it in order to be able to effectively prioritize directories. Adding in the fact that it's time-consuming to do this, I'm leaning away from this approach.

I've considered just using dd, but I'm not sure how it would handle read errors or other problems that might prevent only certain parts of the data from being rescued, or which could be overcome with some retries, but not so many that they endanger other parts of the drive from being saved. I guess ideally it would do a single pass to get as much as possible and then go back to retry anything that was missed due to errors.

Is it possible that copying more slowly — e.g. pausing every x MB/GB — would be better than just running the operation full tilt, for example to avoid any overheating issues?

For the "where is your backup" crowd: this actually is my backup drive, but it also contains some non-critical and bulky stuff, like music, that aren't backups, i.e. aren't backed up.

The drive has not exhibited any clear signs of failure other than this somewhat ominous sound. I did have to fsck a few errors recently — orphaned inodes, incorrect free blocks/inodes counts, inode bitmap differences, zero dtime on deleted inodes; about 20 errors in all.

The filesystem of the partition is ext3.

intuited
  • 3,538

3 Answers3

11

You can use ddrescue or dd_rescue or myrescue to clone the failing disk, without aborting on any unreadable sector. (Myrescue is less configurable but has a better default strategy as it tries to skip over unreadable regions.) This will copy everything including blank space and won't let you set priorities. However, such a low-level approach has an advantage over filesystem-level tools: if a directory is unreadable, you might still recover the files it contains by searching the raw image with tools such as foremost, magicrescue, photorec included in testdisk, etc.

  • The link for magicrescue seems to be broken; did you mean https://www.itu.dk/~jobr/magicrescue/ instead? – landroni Feb 01 '14 at 10:39
  • @landroni Yes, I guess this student graduated, thanks. – Gilles 'SO- stop being evil' Feb 01 '14 at 14:14
  • "Myrescue [..] has a better default strategy" Would you consider posting a ddrescue example configured with the myrescue defaults? Thanks! – landroni Feb 01 '14 at 15:23
  • 1
    For ddrescue/dd_rescue you make first pass with large block size and change to smaller in the following passes (e.g. halving the size each time). This of course requires you to use a logfile (see man page). – peterph Feb 01 '14 at 16:38
  • Confusingly, on Debian-based systems the command ddrescue comes from package gddrescue; dd_rescue from package ddrescue; and myrescue from package myrescue. – landroni Feb 01 '14 at 17:10
  • @peterph I am now reading the docs of gddrescue and it seems to me that by default ddrescue does behave mostly like myrescue. In short: ddrescue reads the non-tried parts in large blocks, and skips the failed blocks (marking them as such) but also the slow areas. It then returns to the tried&failed areas, and reads each non-read sector one by one. Not exactly like myrescue, but pretty similar, is it not? – landroni Feb 02 '14 at 11:07
  • @Gilles Actually from the gddrescue docs, ddrescue initially skips over unreadable or slow-reading areas. – landroni Feb 02 '14 at 19:10
8

There's no way of knowing the best of your options without knowing exactly what is going wrong with the drive. If it's a mechanical failure, avoiding heating it up can help, but if it's due to errors in the servo data, heat isn't likely to matter.

I would immediately start copying the unique data to the new drive with rsync. rsync will let you pause, resume, and restart as necessary until you get all the data off.

Then I would run a data scrub on the drive. I assume from the ext3 filesystem that you're running Linux, so try this:

# umount /dev/sdX
# badblocks -n /dev/sdX

(Unmounting the drive first is important.)

This will read every sector from the disk and write it back without change. That will force the drive firmware to check every sector for errors and to remap any bad sectors it finds. This is the most important part of what the expensive SpinRite program does. Step up to that only if badblocks fails and you still haven't gotten all the unique data off the drive: SpinRite tries harder than badblocks does.

Warren Young
  • 72,032
2

If the disk is dying, first make sure you make as good clone of it as you can (see Gilles answer) and only then proceed to playing with the drive. That way you'll always have at least some data, in case something goes wrong (which can often happen with a failing hardware).

If you use ddrescue (or dd_rescue, I'm not sure about the others), you can always make a copy of the partially cloned data and the associated logfile and try to improve it by running ddrescue again after doing something that was supposed to fix the drive. It will try to read the missing parts while leaving the well-cloned parts untouched.

peterph
  • 30,838