9

Background/Context:

I am currently running GNU ddrescue 1.18.1 to recover data from a USB that experienced a cable disconnect while I was writing a virtual disk image onto the disk2s1 partition. Initially I am recovering my second partition (disk2s2) and notice that I have reached the third phase (Splitting). I am placing the image onto a network storage.

Question:

I have noticed that this phase loops. Is there a way to calculate the number of loops I am likely to experience, given my current status information (I am only showing two errors)?

Status:

status

Update/Edit:

So I am still very much interested in how one might estimate the loops/time for completion using the ddrescue tool. Per the comments, I am adding an evaluation of a log file for my disk2s1 partition as that is currently running (the disk2s2 has completed after 14.5 hours, with one user interruption for about 6 hours).

part1-log

Completed Partition Log

For the partition which just completed, here is the result of the log inspection.

photo-log

Reference (ddrescue algorithm notes):

4 Algorithm


GNU ddrescue is not a derivative of dd, nor is related to dd in any way except in that both can be used for copying data from one device to another. The key difference is that ddrescue uses a sophisticated algorithm to copy data from failing drives causing them as little additional damage as possible.

Ddrescue manages efficiently the status of the rescue in progress and tries to rescue the good parts first, scheduling reads inside bad (or slow) areas for later. This maximizes the amount of data that can be finally recovered from a failing drive.

The standard dd utility can be used to save data from a failing drive, but it reads the data sequentially, which may wear out the drive without rescuing anything if the errors are at the beginning of the drive.

Other programs read the data sequentially but switch to small size reads when they find errors. This is a bad idea because it means spending more time at error areas, damaging the surface, the heads and the drive mechanics, instead of getting out of them as fast as possible. This behavior reduces the chances of rescuing the remaining good data.

The algorithm of ddrescue is as follows (the user may interrupt the process at any point, but be aware that a bad drive can block ddrescue for a long time until the kernel gives up):

1) Optionally read a logfile describing the status of a multi-part or previously interrupted rescue. If no logfile is specified or is empty or does not exist, mark all the rescue domain as non-tried.

2) (First phase; Copying) Read the non-tried parts of the input file, marking the failed blocks as non-trimmed and skipping beyond them. Skip also beyond slow areas. The skipped areas are tried later in two additional passes (before trimming), reversing the direction after each pass until all the rescue domain is tried. The third pass is a sweeping pass, with skipping disabled. (The purpose is to delimit large errors fast, keep the logfile small, and produce good starting points for trimming). Only non-tried areas are read in large blocks. Trimming, splitting and retrying are done sector by sector. Each sector is tried at most two times; the first in this step (usually as part of a large block read, but sometimes as a single sector read), the second in one of the steps below as a single sector read.

3) (Second phase; Trimming) Read forwards one sector at a time from the leading edge of the smallest non-trimmed block, until a bad sector is found. Then read backwards one sector at a time from the trailing edge of the same block, until a bad sector is found. For each non-trimmed block, mark the bad sectors found as bad-sector and mark the rest of that block as non-split without trying to read it. Repeat until there are no more non-trimmed blocks. (Large non-trimmed blocks are produced by concatenation of smaller ones, and its fraction of good data at the edges is therefore smaller).

4) (Third phase; Splitting) Read forwards one sector at a time from the center of the largest non-split block, until a bad sector is found. Then, if the bad sector found is not the first one tried, read backwards one sector at a time from the center of the same block, until a bad sector is found. If the logfile is larger than '--logfile-size', read sequentially the largest non-split blocks until the number of entries in the logfile drops below '--logfile-size'. Repeat until all remaining non-split blocks have less than 7 sectors. Then read the remaining non-split blocks sequentially.

5) (Fourth phase; Retrying) Optionally try to read again the bad sectors until the specified number of retry passes is reached. Every bad sector is tried only once in each pass. Ddrescue can't know if a bad sector is unrecoverable or if it will be eventually read after some retries.

6) Optionally write a logfile for later use.

The total error size ('errsize') is sum of the sizes of all the non-trimmed, non-split and bad-sector blocks. It increases during the copying phase and may decrease during trimming, splitting and retrying. Note that as ddrescue splits the failed blocks, making them smaller, the total error size may decrease while the number of errors increases.

The logfile is periodically saved to disc, as well as when ddrescue finishes or is interrupted. So in case of a crash you can resume the rescue with little recopying. The interval between saves varies from 30 seconds to 5 minutes depending on logfile size (larger logfiles are saved at longer intervals).

Also, the same logfile can be used for multiple commands that copy different areas of the input file, and for multiple recovery attempts over different subsets. See this example:

Rescue the most important part of the disc first. ddrescue -i0 -s50MiB /dev/hdc hdimage logfile ddrescue -i0 -s1MiB -d -r3 /dev/hdc hdimage logfile

Then rescue some key disc areas. ddrescue -i30GiB -s10GiB /dev/hdc hdimage logfile ddrescue -i230GiB -s5GiB /dev/hdc hdimage logfile

Now rescue the rest (does not recopy what is already done). ddrescue /dev/hdc hdimage logfile ddrescue -d -r3 /dev/hdc hdimage logfile

Tommie C.
  • 203
  • 3
  • 7
  • is the disk still connected under the same device name at all? Also you should need ddrescue only if the disk has bad blocks, which would not be caused by a "cable disconnect". If you have cable problems, just try a different cable... – frostschutz Aug 12 '14 at 16:00
  • @TommieC. can you try ddrescuelog -t YourLog.txt in another terminal? – Simply_Me Aug 12 '14 at 16:35
  • @Simply_Me Please see the updated question reflecting two results. – Tommie C. Aug 12 '14 at 17:31
  • @frostschutz Please see the updated question for more details. The lost cable connection occurred while the disk was writing and caused problems with the partition table. The cable itself is undamaged. – Tommie C. Aug 12 '14 at 17:33
  • The cable disconnect will usually cause logical errors (ie. the data on the disk is not 100% valid), but won't cause physical problems with the drive - unless you dropped it at the same time. ddrescue can only try to recover physical problems and won't help with logical errors at all. For the latter, try fsck or alike.. – Udo G Jan 03 '15 at 12:53

1 Answers1

6

Even though the question was asked 10 months ago, the answer might be relevant because the recovery cycle might still be running depending on a few factors! No pun intended.

The reason is that, time estimate is almost impossible, however sometimes you could get a rough idea as follows. One of the most obvious reasons is that you can't predict how long it will take the drive to read a bad sector and if you want ddrescue to read and retry every single one, then it could take a very long time. For example, I'm currently running a recovery on a small 500GB drive that's been going on for over 2 weeks and I possibly have a few days left. But mine is a more complicated situation because the drive is encrypted and to read anything successfully, I have make sure to get all sectors that have partition tables, boot sectors and other important parts of the disk. I'm using techniques in addition to ddrescue to improve my chances for all the bad sectors. IOW, your unique situation is important in determining time to completion.

By estimate of "loops", if you mean number of retries then that's something you determine by the parameters you use. If you mean "total number of passes", that's easily determined by reading about the algorithm here.. >man ddrescue< / Algorithm: How ddrescue recovers the data

I'll specifically speak to the numbers in the screen captures you provided. Other situations may have other factors that apply, so take this information as a general guideline.

In the sample you've provided take a look at ddrescue's running status screen. We get the total "estimate" of the problem (rescue domain) by "errsize". This is the amount of data that is "yet to be read". In the sample it is 345GB. Next line below to the right is "average rate". In the sample it is 583kb/s

If the "average rate" was to remain close to steady, this means you have 7 more days to go. 345 GB / (583 kb * 60*60*24) = 7.18 However the problem is that you can't rely on the 583kb/s. In fact deeper you go into recovery, the drive gets slower since it's reading more and more tougher areas and is doing more retries. So the time to finish exponentially increases. All of this depends on how badly the drive is damaged.

The sample you've provided shows a "successful read" was over 10 hours ago. That's saying that it's not really getting anything from the drive for 10+ hours. This shows that your drive may have 345GB worth (or a portion) of data shot. This is very bad news for you.

In contrast, my second 500GB drive that had just started giving me "S.M.A.R.T" errors, was copied disk to disk (with log file on another drive) and the whole operation took about 8-9 hours. The last part of it slowed down. But that's still bearable. While the very bad drive, as noted above is well past 2 weeks working on 500GB and still has about 4-5 % remaining to recover.

HTH and YMMV

LMSingh
  • 176