0

I have a computer with two hard drives in it. One carries the OS and a whole bunch of other stuff; the other was mounted inside /media as a dump-space for an additional terabyte of storage space. Recently, I upgraded the system from Ubuntu Maverick to Debian Jessie, which involved the removal of a bunch of incompatible packages and the installation of a bunch more, and could have broken stuff; it's also possible that the hard drive was dying, and when I rebooted, it decided to give up.

There's nothing utterly crucial on this drive, but I would like to retrieve it rather than rebuild - also, I prefer to know what went wrong, rather than just mask the problem and move on. So what I'm asking for is recommendations on how to debug a strange hard drive failure in which a hard drive's file system is no longer recognized. If the question isn't appropriate here, I apologize, and please redirect me to a better place!

Prior to the upgrade, the primary drive was (I believe) /dev/sda, and the secondary was /dev/sdb. Now, they show up as:

$ sudo parted -l
Model: ATA WDC WD1002FAEX-0 (scsi)
Disk /dev/sda: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start   End     Size    Type     File system  Flags
 1      32.3kB  1000GB  1000GB  primary


Model: ATA ST31000333AS (scsi)
Disk /dev/sdb: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start   End     Size    Type      File system     Flags
 1      1049kB  997GB   997GB   primary   ext4            boot
 2      997GB   1000GB  3143MB  extended
 5      997GB   1000GB  3143MB  logical   linux-swap(v1)

Note that the file systems on /dev/sdb show up correctly (most of the terabyte as ext4, bootable, and 3GB swap partition), and the partitions on /dev/sda are most likely correct (though I don't have pre-upgrade partition table dumps), but with no file systems listed.

Attempting to fsck /dev/sda1 produces this error:

$ sudo fsck /dev/sda1
fsck from util-linux 2.25.2
e2fsck 1.42.12 (29-Aug-2014)
ext2fs_open2: Bad magic number in super-block
fsck.ext2: Superblock invalid, trying backup blocks...
fsck.ext2: Bad magic number in super-block while trying to open /dev/sda1

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

A list of likely superblock options produced no helpful results, but I'm wondering if maybe there's a partition table offset. Is there a way to search for a valid superblock?

Also, I'm not 100% certain that this was an ext3/ext4 file system. It's possible that I used a different file system, but I don't know what. Is there any way to explore the partition and figure out what it would be using, such that I can install an additional file system driver?

Any pointers would be a help. Thanks!

EDIT: doktor5000 suggested I grab testdisk and see what it says.

Disk /dev/sda - 1000 GB / 931 GiB - CHS 121601 255 63
Current partition structure:
     Partition                  Start        End    Size in sectors

No ext2, JFS, Reiser, cramfs or XFS marker
 1 P Linux                    0   1  1 121600 254 63 1953520002
 1 P Linux                    0   1  1 121600 254 63 1953520002
No partition is bootable

Selecting 'Quick Search' produces this:

Disk /dev/sda - 1000 GB / 931 GiB - CHS 121601 255 63
     Partition               Start        End    Size in sectors
>* Linux                    0  32 33 118619 237 18 1905627136
 P Linux Swap           118620  14 51 121601  57 56   47892480

I previously forgot to quote what fdisk said, so here's that:

$ sudo fdisk /dev/sda -l

Disk /dev/sda: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x000a56a5

Device     Boot Start        End    Sectors   Size Id Type
/dev/sda1          63 1953520064 1953520002 931.5G 83 Linux

So, the discrepancies I'm seeing are: fdisk reckons the partition starts at 63, but testdisk says 32 (I think); and testdisk says that there's a swap partition there, which doesn't make sense to me (it's a secondary drive, I don't know why I'd have allocated any swap space). But, coolness of coolnesses, I can dig into the file system and copy files off! This is awesome compared to what I had to work with the last time I tried disk recovery - but then, that was back in the 1990s using OS/2, so no surprises there :)

I'm confident enough to let testdisk write out a new partition table. And yep! All the data's there and readable, the drive appears to be working just fine. Many thanks, doktor5000! So, follow-up question... any idea how the partition table could have came to be corrupted?

rosuav
  • 830
  • 2
  • 10
  • 16
  • Yep, testdisk and its companion photorec (to recover deleted files by their signature after scanning the whole drive, even when partition table is already corrupted) are pretty awesome and even more so easy to use.

    On how the partition table got corrupted, that's more of a crystal-ball type of question. Something similar can happen when you use several different partitioners on one drive, sometimes they interpret things slightly different. So best stick to only one. If you like gparted, use it for everything. Like the bundled partitioner of your favorite linux OS? Use it for everything.

    – doktor5000 May 10 '15 at 16:38
  • @doktor5000: I generally do try to stick to one, but it's entirely possible that the partitioner that came with an ancient Ubuntu is not the same as the one that came with a modern Debian. So that's a reasonably plausible possibility. I like possibilities that don't suggest hardware failure, because I don't like hardware failure. :) – rosuav May 11 '15 at 16:30

1 Answers1

1

Most obvious would be a deep search for partitions via testdisk.
See their general guide on how to run it and/or the step-by-step documentation

Then compare the partition table that testdisk found with what you see currently, without changing anything, and maybe post it here.

Another option, at least for ext2/3/4 would be using debugfs command but this is way more complex and not as straightforward as testdisk.

In any case, if you want to recover what was on that disk, it's probably a good idea to create an image from it, and only work on that image, so you don't risk losing more data. See saving data from a failing drive for some suggestions on how to do that.

doktor5000
  • 2,699