2

Here is the situation. My UPS battery has recently died, and I've not replaced it yet. Today someone came over to do electrical work on the home, which involved shutting down the circuit breaker on which the computer sits. I'm smart enough to start shutting down my computer before I trip the breaker, but I was talking to the electrician and forgot to check that my computer was 100% down before tripping the breaker. The result:

When I log into one account, using KDE, during the login process I start getting errors like this: since I only get these errors when I log in as this person and only into KDE ( other DEs and other accounts with KDE OK ) I think it is safe to say that something in my kde configuration is screwed up.

ata3.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
ata3.00: irq_stat 0x40000008
ata3.00: failed command: READ FPDMA QUEUED
ata3.00: cmd 60/08:00:98:b2:78/00:00:13:00:00/40 tag 0 ncq 4096 in
         res 41/40:08:9a:b2:78/00:00:13:00:00/00 Emask 0x409 (media error) <F>
ata3.00: status: { DRDY ERR }
ata3.00: error: { UNC }
ata3.00: SB600 AHCI: limiting to 255 sectors per cmd
ata3.00: SB600 AHCI: limiting to 255 sectors per cmd
ata3.00: configured for UDMA/133
sd 2:0:0:0: [sda] Unhandled sense code
sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 2:0:0:0: [sda]  Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
        13 78 b2 9a 
sd 2:0:0:0: [sda]  Add. Sense: Unrecovered read error - auto reallocate failed
sd 2:0:0:0: [sda] CDB: Read(10): 28 00 13 78 b2 98 00 00 08 00
end_request: I/O error, dev sda, sector 326677146
ata3: EH complete

My question comes in two parts.

1) I'm kind of freaked that I see this kind of error based on just a minor ( relatively speaking ) corruption error in a file system . 2) My kde configuration is kind of long and involved. I don't want to delete the whole thing and start over. Is there a way of logging the progress of a startup sequence to see when it encounters an error, so I can just delete that part of the configuration?

2 Answers2

3

This has nothing to do with KDE; your drive has a bad sector. Since you had a sudden loss of power, what likely happened is the power went out mid way through writing to a sector, leaving it in a bad state. Whatever was being written at the time is lost, but you should be able to correct the problem by writing something to that sector now. First, run this:

sudo dd if=/dev/sda bs=512 count=1 of=/dev/null skip=326677146

That should fail with an IO error. If it does, proceed with writing zeros to that sector:

sudo dd if=/dev/zero bs=512 count=1 of=/dev/sda seek=326677146

This should work and then you should be able to repeat the first command without error. Then you should use smartctl from the smartmontools package to check the drive for errors. Run sudo smartctl -t long /dev/sda to start the drive's self test. Check its progress with sudo smartctl -a /dev/sda. If it finds more bad sectors, you can try using dd to correct those as well. You also want to make sure that the reallocated sector count is zero. If it is not, then there is physical damage to the disk and you should think about replacing it.

psusi
  • 17,303
0

These errors do not indicate filesystem corruption, they indicate a problem with the disk. It's likely that the power outage caught the disk at a bad moment and one of the heads crashed onto the platter — disks are supposed to be protected against that, but this doesn't always work (there's often a power glitch before the power goes down completely).

If your disk supports it, run smartctl (from Smartmontools) and badblocks to get a damage assessment. Then make sure you have the whole disk backed up (ddrescue or dd_rescue might help — see saving data from a failing drive).

I recommend replacing the disk, as its reliability is compromised. If you want to play the lottery keep using it, run badblocks -o /tmp/badblocks then e2fsck -l /tmp/badblocks to mark the damaged sectors as unusable.

  • 1
    Heads don't crash from power loss, they crash from mechanical damage or shock. – psusi Jul 22 '12 at 01:32
  • @psusi It's an indirect consequence: power loss → head not parked in the right position → it takes a smaller shock to cause damage. – Gilles 'SO- stop being evil' Jul 22 '12 at 12:16
  • 1
    Heads haven't needed to be parked before power down since the early '90s. Drives now are designed to keep enough energy in reserve to retract the heads when the power fails, or use a spring or other mechanical means to fail safe. Some drives keep a count of how many times they have had to do an emergency retract and report it via SMART. – psusi Jul 22 '12 at 22:32
  • @psusi In theory, yes. In practice, I've observed a positive correlation between sudden power failures and damaged disks. – Gilles 'SO- stop being evil' Jul 22 '12 at 22:35
  • 1
    In my experience, the disk is just fine, but you sometimes get a bad sector because it did not get completely written before the heads retracted ( thus, the ECC doesn't add up ). You can fix that by simply writing over the sector again and everything is fine, including zero reallocated sectors reported by SMART. – psusi Jul 22 '12 at 22:41
  • Giles, this answer is way below your standards. Throw away a hard drive without being sure it is damaged? With the price of hard drives what it is? First eliminate as many problems as you can, then run smartctl to test the drive. If smartctl says the drive should be replaced, then and only then throw away the hard drive. – Mouse.The.Lucky.Dog Aug 24 '12 at 03:03