8

I'm having a hard time finding others with the same error, and I'm trying to figure out the best path forward.

I have a hard drive that was unusably slow, and then stopped booting. The clonezilla clone failed, and I started a ddrescue, using the gnu rescue tool included with a clonezilla live cd. It is going unbelievably slow averaging about 400 kBps for a 2 TB drive, so I'm estimating almost 4 months! at this point. My last backup was sadly about 2 years ago, and there are a lot of pictures I'd like to get off of it. The surprising part is that its rescued about 50 GB, with no errors so far, even though its taken 3 days. I have a few questions on the best path forward, and why it would take so long but also not have any errors.

Is the drive just taking forever to succesfully read, but never actually failing, slowing down the copy time? Is the hard drive itself fine, but something like the control board the problem?

I'm very worried about where the logfile is likely going. I can't depend on my computer staying steady, and the command not erroring for four months. If I'm talking at all about keeping it running for even weeks, I'd like to get that logfile onto a flash drive. I originally thought it was going onto the new larger hard drive, but now I realize its likely on the RAM drive clonezilla_live is utilizing. Is it safe to insert a formatted USB drive, mount it, and copy over the log file, then restart the ddrescue? Will the clonezilla shell even recognize that I inserted the USB stick that wasn't there on boot, so I can mount it?

I'm assuming I'd try sudo fdisk -l to list the disks, then make a directory? sudo mkdir /logfile/usb then mount it? sudo mount /dev/sdb1 /media/usb, then copy?

ANY feedback would be appreciated. I've screwed around in Unix shell a bit, setup a z-pool raid, but always when I knew exactly what I was doing, and not in linux, let alone a bare-bones version.

countermode
  • 7,533
  • 5
  • 31
  • 58
KinaMan
  • 301

3 Answers3

12

If anyone is interested or comes across an archived version of this in a few years. I waited the two months, establishing a log file to resume copying. Twice it just started getting reading errors (until the computer was restarted), and once I lost power. After months of copying, I plugged the backup in via a USB adapter to another laptop, I probably had 7.5 mb of the ~2 tb that wasn't copied (still had errors after -r3 (3 retries)). It was unreadable, but I rebuilt the partition table per these instructions: https://perrohunter.com/repair-a-mac-os-x-hfs-partition-table/ - I did have to change the block size since this drive is much larger than the older drives.

It then worked close to flawlessly. I did a disk verify and repair, and permissions repair in disk utility, and it booted up fine.

Real lesson learned? I'm using backblaze for the really important files (photos and documents) and a mirrored bootable backup on-site.

KinaMan
  • 301
  • A friend brought me his 500 gb drive. The first time I set -r3 too, but then ddrescue (I use the GUI in Parted Magic) showed itself a remaing time of more then 3 months so I restarted the process with -r1. The rescue process was "fast" until it reached 67 gb. At the moment it recovers "only" 30 gb per day so I need to wait 2-3 weeks. Because I do not think USB to USB is the best way to copy the data I used a separate PC (Intel NUC) and installed the drive there (SATA). Do you think I will have the same problems at the end with the partition tables (Its a windows boot drive = NTFS)? – mgutt Apr 06 '16 at 15:10
  • In the meantime I found this question: http://unix.stackexchange.com/questions/79225/is-this-ddrescue-command-doing-anything I hope my job won't end like that o.O – mgutt Apr 06 '16 at 16:32
  • I don't think so, I didn't see many other people who did. I think I had some issues with the small partition table in the front of the drive. Most of the people who had to fix it had made a dumb mistake like delete a partition. Let us know how it goes. – KinaMan Apr 09 '16 at 14:32
  • After reading the manuel and analyzing the log file I understand much more how ddrescue works and how to interpret the process. I posted an answer with more explanation. – mgutt Apr 10 '16 at 09:04
6

ddrescue marks bad sectors not until it reaches the second phase: https://www.gnu.org/software/ddrescue/manual/ddrescue_manual.html

(Second phase; Trimming) Trimming is done in one pass. For each non-trimmed block, read forwards one sector at a time from the leading edge of the block until a bad sector is found. Then read backwards one sector at a time from the trailing edge of the block until a bad sector is found. Then mark the bad sectors found (if any) as bad-sector, and mark the rest of the block as non-scraped without trying to read it.

And the problem is: It can take a looong time until this phase starts. Its devided up into three passes:

  1. copy forward and mark blocks as rescued, non-trimmed and non-tried depending on timeouts etc.
  2. copy backwards and read all non-tried blocks
  3. copy forward without skipping to prepare large errors for trimming

Unfortunately nobody can predict how long this phase will take as it depends on the amounts of errors (hours, days, weeks or even month as in your case).

Note: The --retry-passes=n (r) flag is only important for the fourth phase:

(Fourth phase; Retrying) Optionally try to read again the bad sectors until the specified number of retry passes is reached.

So it does not speed up the first phase with its passes by reducing the retrys.

But you can see in the ddrescue log file if it has marked some blocks as "rescued" so you can hope it will rescue some or all data of the drive. Here is an example:

#      pos        size  status
0x00000000  0x00117000  +
0x00117000  0x00000200  -
0x00117200  0x00001000  /
0x00118200  0x00007E00  *
0x00120000  0x00048000  ?

If the log file contains lines with the +-status there is hope. It means "rescued". But if it only contains ? (non-tried) and * (non-trimmed) I think you can give up. Of course there could be a chance that the drive is only defective at the beginning, but I think this is only a small chance. But if you can afford to run ddrescue through a second pc you should try it depending how important the data is. The final hope could be to replace the head/electronics but this could be expensive.

An alternative to analyze the logs is to use ddrescue log viewer: https://sourceforge.net/projects/ddrescueview/

I use Parted Magic as it contains ddrescue GUI and ddrescue log viewer.

Here you can see a screenshot of the viewer in the middle of phase 1 pass 2 (copy backwards): ddrescue log viewer

The arrow shows the current position. As you can see this drive has many possible bad sectors (in this phase marked as "non-trimmed") in the middle and that was the reason why I gave up.

mgutt
  • 467
0

Make sure that you data is still being copied - check the output files, and make sure that their size keeps increasing. Failing hard drives have a tendency to freeze up when you try to copy a lot of data off them at once.

If it looks like it isn't doing anything, it's probably best to stop the operation, then go back and copy one directory at a time, so that if this happens again you will at least have a complete directory. If it is working, you can leave it be for another day or two. Time estimates are often terribly incorrect, but it definitely shouldn't take a month!

I'm not to familiar with ddrescue but I often use Data Rescue at work, and I've never seen a complete hard drive image complete if it doesn't do so within a day. That being said, it's best to only copy over the directories you need (probably /home,) since applications can be reinstalled and settings reconfigured, but documents and pictures can't be replaced.

As far as the log files, I wouldn't touch them while a data rescue utility is running.

  • 1
    It is definitely copying, the file size is growing and still no errors reported. Is it possible to open the logfile or check the free space on the destination drive while its running? Strangely, its elapsed time is about half of real time (Coming up on ~2 days when ~4 was real time) – KinaMan Jan 22 '16 at 23:00
  • It's usually best to not mount a drive when you're running data recovery on it since the tools often have to use some roundabout mounting method to get failed partitions detected by the OS. I'm sure it probably wouldn't hurt anything, and at worst it would probably just stop the recovery process. I'm always skeptical of drive copies that take > 24hrs but if it seems to be working, more power to DDrescue! – Trevor Gross Jan 23 '16 at 04:29
  • 1
    STILL going at about 200 gb with no errors out of 2 tb, been about 6 days. Woohoo, only 54 more days to go!? I'll try to update because most of my google results have been about 80-300 gb hard drives, not multiple terabytes. – KinaMan Jan 26 '16 at 06:22
  • Well better late than never I guess. Are you able to see the contents of the files it recovers? – Trevor Gross Jan 26 '16 at 12:58
  • Still haven't tried to access the files, since its copying the raw data I figured any files were probably strewn across the drive, and I didn't want to attempt to repair the clone. It did stop, and listed its first error, which was the ~1600 gb it hadn't gotten too. I tried to repeat the ddrescue in reverse, which also failed. – KinaMan Feb 04 '16 at 19:57