16

I am in the process of salvaging data from a 1 TB failing drive (asked about it in Procedure to replace a hard disk?). I have done ddrescue from a system rescue USB with a resulting error size of 557568 B in 191 errors, probably all in /home (I assume what it calls "errors" are not bad sectors, but consecutive sequences of them).

Now, the several guides I've seen around suggest doing e2fsck on the new disk, and I expected this to somehow find that some files have been assigned "blank sectors/blocks", to the effect of at least knowing which files could not be saved whole. But no errors were found at all (I ran it without -y to make sure I didn't miss anything). Now I am running it again with -c, but at 95% no errors were found so far; I guess I have a new drive with some normal-looking files with zeroed or random pieces inside, undetectable until on day I open them with the corresponding software, or Linux Mint needs them.

Can I do anything with the old/new drives in order to obtain a list of possibly corrupted files? I don't know how many they could be, since that 191 could go across files, but at least the total size is not big; I am mostly concerned about a big bunch old family photos and videos (1+ MB each), the rest is probably irrelevant or was backed up recently.

Update: the new pass of e2fsck did give something new of which I understand nothing:

Block bitmap differences:  +231216947 +(231216964--231216965) +231216970 +231217707 +231217852 +(231217870--231217871) +231218486
Fix<y>? yes
Free blocks count wrong for group #7056 (497, counted=488).                    
Fix<y>? yes
Free blocks count wrong (44259598, counted=44259589).
Fix<y>? yes
terdon
  • 242,166
David Sevilla
  • 303
  • 2
  • 9
  • From what I read here and there, I understand a bit the "Block bitmap differences" stuff, but I fail to understand if I could use it for my problem of finding the corrupted files. – David Sevilla Apr 26 '17 at 15:59
  • You'll need the block numbers of all encountered bad blocks (ddrescue should have given you a list, I hope you saved it), and then you'll need to find out which files make use of these blocks (see e.g. here). e2fsck doesn't help, the bad blocks will now just be empty. – dirkt Apr 26 '17 at 16:08
  • If you mean the mapfile it produces, I do. Do you want to put your comment as an answer so I can accept it? – David Sevilla Apr 26 '17 at 16:31
  • See this Q and the usage of ddrutility that does pretty much what you want: https://askubuntu.com/q/904569/271 – Andrea Lazzarotto Apr 26 '17 at 21:55

5 Answers5

6

You'll need the block numbers of all encountered bad blocks (ddrescue should have given you a list, I hope you saved it), and then you'll need to find out which files make use of these blocks (see e.g. here). You may want to script this if there are a lot of bad blocks.

e2fsck doesn't help, it just checks consistency of the file system itself, so it will only act of the bad blocks contain "adminstrative" file system information.

The bad blocks in the files will just be empty.

Edit

Ok, let's figure out the block size thingy. Let's make a trial filesystem with 512-byte device blocks:

$ dd if=/dev/zero of=fs bs=512 count=200
$ /sbin/mke2fs fs

$ ll fs
-rw-r--r-- 1 dirk dirk 102400 Apr 27 10:03 fs

$ /sbin/tune2fs -l fs
...
Block count:              100
...
Block size:               1024
Fragment size:            1024
Blocks per group:         8192
Fragments per group:      8192

So the filesystem block size is 1024, and we've 100 of those filesystem blocks (and 200 512-byte device blocks). Rescue it:

$ ddrescue -b512 fs fs.new fs.log
GNU ddrescue 1.19
Press Ctrl-C to interrupt
rescued:    102400 B,  errsize:       0 B,  current rate:     102 kB/s
   ipos:     65536 B,   errors:       0,    average rate:     102 kB/s
   opos:     65536 B, run time:       1 s,  successful read:       0 s ago
Finished                                     

$ cat fs.log
# Rescue Logfile. Created by GNU ddrescue version 1.19
# Command line: ddrescue fs fs.new fs.log
# Start time:   2017-04-27 10:04:03
# Current time: 2017-04-27 10:04:03
# Finished
# current_pos  current_status
0x00010000     +
#      pos        size  status
0x00000000  0x00019000  +

$ printf "%i\n" 0x00019000
102400

So the hex ddrescue units are in bytes, not any blocks. Finally, let's see what debugfs uses. First, make a file and find its contents:

$ sudo mount -o loop fs /mnt/tmp
$ sudo chmod go+rwx /mnt/tmp/
$ echo 'abcdefghijk' > /mnt/tmp/foo
$ sudo umount /mnt/tmp

$ hexdump -C fs
...
00005400  61 62 63 64 65 66 67 68  69 6a 6b 0a 00 00 00 00  |abcdefghijk.....|
00005410  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

So the byte address of the data is 0x5400. Convert this to 1024-byte filesystem blocks:

$ printf "%i\n" 0x5400
21504
$ expr 21504 / 1024
21

and let's also try the block range while we are at it:

$ /sbin/debugfs fs
debugfs 1.43.3 (04-Sep-2016)
debugfs:  testb 0
testb: Invalid block number 0
debugfs:  testb 1
Block 1 marked in use
debugfs:  testb 99
Block 99 not in use
debugfs:  testb 100
Illegal block number passed to ext2fs_test_block_bitmap #100 for block bitmap for fs
Block 100 not in use
debugfs:  testb 21
Block 21 marked in use
debugfs:  icheck 21
Block   Inode number
21      12
debugfs:  ncheck 12
Inode   Pathname
12      //foo

So that works out as expected, except block 0 is invalid, probably because the file system metadata is there. So, for your byte address 0x30F8A71000 from ddrescue, assuming you worked on the whole disk and not a partition, we subtract the byte address of the partition start

210330128384 - 7815168 * 512 = 206328762368

Divide that by the tune2fs block size to get the filesystem block (note that since multiple physical, possibly damaged, blocks make up a filesystem block, numbers needn't be exact multiples):

206328762368 / 4096 = 50373233.0

and that's the block you should test with debugfs.

dirkt
  • 32,309
  • Great. Now I need a little help figuring out those numbers (my first attempts are not giving me anything useful), I'll look around and open a new question on that if needed. But first, maybe I should be doing the debugfs stuff to the old, failing disk instead of the new one? – David Sevilla Apr 26 '17 at 19:39
  • No, use the new one resp. the image if you've made one. Be careful not to mount the new disk and change anything on it before you have identified the files. – dirkt Apr 26 '17 at 19:45
  • Ok, that made sense. Now I need to figure out the correspondence between the binary numbers in the ddrescue log file and the blocks in the partition (which is not the first one in the disk). The page you suggested is a good start, but I need to do more than what is said there. – David Sevilla Apr 26 '17 at 19:48
  • You just need the block number of the start of the partition from fdisk etc., and then subtract it from the absolute block numbers. – dirkt Apr 26 '17 at 20:38
  • Well, I tried that before... fdisk gives start=7815168, the first "-" block from ddrescue is 0x30F8A71000, but subtraction gives 210322313216 which testb complains about: "Illegal block number ... for /dev/sc5". I also tried dividing that position by 512(=0x200) or even by 4096(=0x1000) (the latter not making sense because the other positions are not multiples of 1000, only 200). I guess I'm messing up the units somehow. – David Sevilla Apr 26 '17 at 20:51
  • What are the block sizes for fdisk and for the filesystem on your partition (tunefs)? I didn't have to actual use ddrescue output yet, but the 00 at the end of the block numbers in the mapfile make it suspiciously look like byte positions instead of blocks. – dirkt Apr 26 '17 at 21:30
  • ddrescuelog may also be helpful. – dirkt Apr 26 '17 at 21:37
  • fdisk says sectors of 1*512=512 bytes, sector size: 512/4096, IO size 4096/4096. tunefs says Block size = 4096. All the mapfile locations are multiples of 200, that's why I try dividing by it. I followed https://tim.purewhite.id.au/2011/04/disk-recovery-which-files-are-damaged/ more or less in that. – David Sevilla Apr 26 '17 at 21:40
  • I tried ddrescuelog and neither the last number in the output nor that one minus the fdisk offset of above are valid for testb. – David Sevilla Apr 26 '17 at 21:43
  • Well, I think I am done! But the results look strange so I will ask about this in a separate question. – David Sevilla Apr 28 '17 at 00:05
  • ... here: https://unix.stackexchange.com/questions/361810/bad-sector-file-diagnosis-did-i-reach-the-right-conclusion-about-corrupted-file – David Sevilla Apr 28 '17 at 00:31
6

NTFS, ext3, ext4

After copying the data off your fail{ing,ed} drive with ddrescue, use ddrutility to find the affected filenames.

I successfully got it to list affected NTFS files on a 1TB partition given a ddrescue mapfile in under 20 seconds.

It writes its log file in the current directory.

The linked page mentions support for NTFS, ext3 and ext4.

btrfs, zfs

These filesystems have their own built-in scrub function.

Tom Hale
  • 30,455
4

I would recommend an already implemented utility called ddrutility. That would eliminate the manual tedious calculations.

You should be running it on your cloned copy (not the original) drive device like so:

ddru_findbad /dev/sdb /ddrescue-disk-copy.map

The usage of the map file (second argument) is mandatory here.

The utility is quite smart, supports different filesystems (even NTFS) and also has the functionality of testing of yet-to-be split erroneous sectors (marking them as bad temporary), so you should be able to estimate if you need the whole ddrescue procedure to be finished. Also note, that /dev/sdb is used as a whole disk here (not some partition like /dev/sdb1), since the whole disk was originally cloned.

The utility is available in Debian repos and can be installed with:

sudo apt install ddrutility

The project's wiki: https://sourceforge.net/p/ddrutility/wiki/Home

Vladius
  • 141
  • 1
  • 5
1

The easiest way, although not necessarily the fastest or most efficient way, would be to:

  1. Run ddrescue normally to rescue the whole drive, and be sure to preserve the mapfile.
  2. ReRun ddrescue in fill-mode to mark bad sectors with a unique pattern. They reccomend something like this:
    ddrescue --fill-mode=- <(printf "BAD-SECTOR ") outfile mapfile
    In order to alleviate false positives you want to use a pattern that would not normally exist in any file.
  3. Mount the rescued image/disk with it's native operating system.
  4. Use an appropriate operating system utility, like e2fsck on linux, to verify and possibly repair the filesystem directory structure. Any bad sectors that fall in filesystem structures first need to be resolved before you can go looking for all the file corruption.

    Repairing directory structures is an art in and of it's self which is out of this answers scope.

  5. Use an appropriate utility provided by the operating system, like grep, to scan all the files on the filesystem and list those which contain the unique pattern that fill-mode marked them with.
  6. If necessary, you can examine the files with the appropriate editor to locate the position of the actual data loss by searching for the unique pattern within the file(s).

This is operating system independent so I'm intentionally not giving details that vary depending on the specific filesystem type. I first had to do this on an NTFS filesystem using windows utilities, but it's the same idea on ext3/4, etc.

tlum
  • 111
  • 2
-2

I used Filezilla simple and fixed my problem. Use regular Filezilla to backup all good data. I notice that one big file was not copying correctly (Stopping in the middle and restarting the transfer). Luckly I have a previous backup of same file. To duplicate the disk, then I had to find the bad blocks on the disk using this procedure:

1st find out the problem disk identifying the HD info using fdisk -l

2nd if lets say your disk is /dev/sdb then you need to run the command badblocks -v /dev/sdb it will list all you bad blocks on the drive. Luckily there will be a few. If no bad blocks are found, then your drive blocks are OK and need to figure something else out. My block size is 512 so I use that default number to run DD

3rd each block is 512 size, so what I done is to set bs=512

Each time I runned DD regularly as I always do, my data, after the errors, will come out corrupted. So I then use the parameters as explained on the page https://www.gnu.org/software/coreutils/manual/html_node/dd-invocation.html search the "For failing disks" part.

dd if=/dev/sdb of=/dev/sda bs=512 conv=noerror,sync iflag=fullblock 

It took a while. Each bad block encountered sound like a banging on the faulty drive. It does copy block by block, and thru all my bad blocks made the same noise. The amount of times made a noise, was because it found another bad block and tells you about on display error msg. What the ‘conv=noerror,sync’ does, is to pad out bad reads with NULs, while ‘iflag=fullblock’ caters for short reads, but keeps in sync your data up to the end. No corruption at all, it just does not copy the faulty blocks and fills it with empty NULs.

After the copy with DD was done, I just replace that bad file reverting Filezilla from a past backup and everything worked OK. I hope this will be usefull for others trying to backup faulty drives.

NOTE: My bad blocks where pretty much close to each other. About 4 blocks at a time together in groups where detected bad. If your blocks are all over the disk, several files could be affected. Luckly, on my case, a big database 4gb file was only affected.

  • 1
    Using Filezilla does not answer the question. Moreover, the options you are specifying for dd are, actually, changing your data (it fills the corrupted data being copied will padding). This is not an answer to this question and not a proper way to have a raw copy. – Paradox Jul 26 '19 at 03:50
  • Yes it figure which file was corrupted, by not downloading or retrying several times. That way I figure it out, just by not be able to copy the file from server to my workstation. Only one file was not able to download, and many unsuccesful retries on the same file. So it did worked for me. And could work for others. Im happy now just by replacing that file, the whole sistem is up and running – Luis H Cabrejo Jul 27 '19 at 18:54
  • 1
    The answer is off-topic regarding the question. End of story. Glad it worked for you but this is not the question here. This is not a blog or a Reddit thread, so please watch the Code of Conduct and How to answer. – Paradox Jul 27 '19 at 19:07
  • Even with the flags used for dd you mentioned conv=noerror,sync iflag=fullblock, like I said, this is not raw copy. Therefore, call it "corruption" or something else, but the data you've retrieve in the end that had to be filled with arbitrary values rather than the original make these data different from what you had. Imagine a picture you've retrieved this way, with half of its values that had been padded, do you expect to see your picture? If yes, you do not understand what are you doing. It is this simple. – Paradox Jul 27 '19 at 19:17
  • The corrupted file was already figured out. I was expecting that padding on that file. The rest of the files were perfectly copied. Im using that copy of the drive with the padding and was able to back it up, copy completely the drive onto another. Did not care much about that file, I had it backed up already. Its obvious that I already figure which file was corrupted since my filezilla couldnt read it to perform the backup, was retrying until it timed out. The question was if I was able to figure the corrupted file, and indeed I nail it just using common sense and regular filezilla. – Luis H Cabrejo Jul 29 '19 at 21:04
  • Still, one more time: it is not the question here, what you were trying to do, and you are sending mixed signals about dd. Here, OP wants to find out which files are lost when using ddrescue and you are off-topic. Apart from that, you are blaming dd for corrupting data, which might be true in your case, but not the usual one. You say you use two alternatives, none of which related to the question or even the reasons you mentioned in the beginning. Long story short, 3 out of 4 times, you were misleading people and off-topic. The question, now, is: why are you still arguing? – Paradox Jul 30 '19 at 08:56
  • 1
    I think the critical piece here is that the OP is asking "Can I do anything with the old/new drives in order to obtain a list of possibly corrupted files?" while Luis had a situation where "The corrupted file was already figured out.". Luis, I'm going to suggest that we remove this answer, as there may not be enough left to answer the OP's question. @Paradox, thank you for helping Luis understand the shortcomings of their answer. Instead of repeating "off-topic", I think it's most helpful to point out where you see their shortcomings so that they have a chance to repair them. – Jeff Schaller Jul 30 '19 at 19:43
  • @JeffSchaller This is what I tried to do in my first comment (maybe too shortly). – Paradox Jul 31 '19 at 11:30