2

I got a strange looking warning message in Windows Vista about a potential hard disk failure. I say strange because I have never in my life seen that type of warning in Windows. It suggested that I backup everything on this disk as soon as possible.

The hard disk in question is the one I use for Ubuntu Linux. I know Windows can't read Linux file systems, not natively anyway, so it's probably some SMART reading that caused Windows to warn me about this disk drive.

Ever since this happened I can't boot into Ubuntu Linux. I see several error lines passing by, something that indeed seems to be related to a disk failure. At the end it only presents the command prompt, the desktop doesn't load.

Is there a way I can recover from this error? How do I grab the error logs from command prompt? I would like to post it here.

Here's are a few screen shots:

screen1 screen2 screen3

Samir
  • 826
  • possible duplicate: http://unix.stackexchange.com/questions/61020/issue-with-bad-sectors-on-a-laptop-hard-drive – slm Jul 09 '13 at 20:18
  • Log in using your regular user credentials, then sudo less /var/log/messages and sudo dmesg | less. (q to exit from less.) Use < to jump to the top, > to jump to the bottom, up/down arrow keys to scroll. Use paper and pencil to copy what looks relevant. (There might be an easier way, but since we don't know what state your system is in, it's best to try to play it safe. less is pretty safe.) Note that if you normally use a non-US keyboard layout, some keys may not produce the characters you expect; in that case, try it out (and use Ctrl+C to cancel any command line) first. – user Jul 09 '13 at 20:27
  • That's great that you were able to get some screenshots. (Is it really as garbled as in the first screenshot, or is it simply scrolling by faster than your camera took the picture?) – user Jul 09 '13 at 20:35
  • I've seen some of those errors before! Your disk or cable is dying. Try changing the cable first (cheap!) – Tim Jul 09 '13 at 20:44

2 Answers2

3

Get a boot disk like SystemRescueCD or similar and run the S.M.A.R.T. tools. This will help you get to the bottom of a SMART error if that is what it really is.

smartctl --all /dev/{hd?,sd?}
user
  • 28,901
Tim
  • 6,141
  • +1 for checking SMART. It might be possible to do through the running system too. That said, given that fsck reports short reads and the kernel complains about I/O problems both in roughly the same area (around block 164,000) I would definitely suspect a physical disk problem (which might be reported by SMART). Logical errors generally don't cause short reads and I/O errors. Since Windows apparently is able to work fine, I doubt it's the data cable. – user Jul 09 '13 at 20:42
  • I'm pretty sure SMART is indicating failure, that's what the mysterious message from Vista is. – derobert Jul 09 '13 at 21:51
  • By "tools" do you mean the smartctl command? I burned systemrescuecd-x86-3.7.0.iso to a CD and then booted with option 2 (all files cached to memory). I get to root@sysresccd /root %. Here, I type smartctl --all /dev/{hd?,sd?} and hit Enter. zsh: no matches found: /dev/hd? So where do I go from here? Have I done something wrong? – Samir Jul 10 '13 at 08:23
  • @Sammy - he means you need to type smartctl --all /dev/sda for example if the drive that's failing is /dev/sda. The way @Tim wrote it it's a fancy regular expression telling you all the ways that you could specify the device name (/dev/hda or /dev/sda) for examples. The ?'s are placeholders, and the {hd?,sd?} is a list. – slm Jul 10 '13 at 08:39
  • Oh... I thought it meant "for any" hdd. So I have to specify it then? How do I know if it's /dev/sda then? How do I identify the disk drive from a command line? I mean I know what to look for, I know the label, the size of it, and the serial number. But what commands do I use to find if it's sda? – Samir Jul 10 '13 at 08:46
  • @slm Is it safe to assume it's /dev/hda since this is a PATA drive and the only PATA drive in the system? – Samir Jul 10 '13 at 08:55
  • @Sammy - yeah that seems reasonable to me. Running smartctl is just going to identify that you're having failures, so this isn't going to fix anything. I'm guessing you know this but just in case. The solutions I offered would potentially fix problems. Given your description of the problem I jumped ahead and assumed that the disc was having failures. That's typically what happens with them. – slm Jul 10 '13 at 08:58
  • @slm Well, actually no, I'm a Windows person and I am new to Linux. So the smartctl command is just used to read and analyze SMART data from drives? I had another go at it earlier today. I used lsblk (list block) command to list all devices and partitions. I was able to identify the faulty drive by its size. The device name was actually sdd so it's /dev/sdd. Since it's a PATA drive, it has to be hda, hdb, hdc, and so on, doesn't it? Anyway... Running smrtctl --all /dev/sdd was no fun, it just shows error (SMART?) information. I also had to enable SMART first with -s on. – Samir Jul 10 '13 at 12:42
  • @Sammy - Yes just reads the SMART data. In some cases it can prod the HDD to move bad sectors when it access them, but in general it just reports. Not necessarily on the hda for PATA. Yes some systems will disable SMART to cut down on the messages. I'm not sure if that's Windows doing that or Linux distros. – slm Jul 10 '13 at 12:46
  • @slm Actually, I think SMART is disabled in my BIOS. I think... will check that... but Windows is able to read SMART data anyway, and so does Linux. So what, they can override the BIOS setting for SMART? – Samir Jul 10 '13 at 12:52
  • @Sammy - yes I think they can tell the drives to disable SMART. – slm Jul 10 '13 at 13:29
  • @slm I checked, SMART was indeed disabled in BIOS. I have enabled it now. And yes, what you say seems to be true, the OS can decide whether to enable or disable SMART for each hard drive. – Samir Jul 12 '13 at 10:16
1

I would attempt to repair the disk with either HDAT (freeware) or possibly Spinrite (Commercial). I've used both of these tools to recover disks that were failing and they have both worked well in the past.

Once the drive is in a usable state I'd use Clonezilla to replicate it as quickly as you can to an alternate HDD.

slm
  • 369,824
  • Absolutely get the data off the drive as soon as you possibly can once you get the drive back to seemingly working condition. At the point the OP's drive sounds to be like, it tends to be living on borrowed time. – user Jul 09 '13 at 20:24
  • I burned hdat2_v493.iso to a DVD and booted. When I get to the command prompt I can't type in "hdat2" or anything else. I picked the option to load all the drivers at the first prompt (blue background). I have a wireless Logitech keyboard. Is there a way around this? I mean without getting a wired keyboard?... I'm not sure I even have one. – Samir Jul 10 '13 at 07:38
  • @Sammy - ha, it's always something isn't it? I found this thread, http://hdat2.getphpbb.com/hdat2-problems-f1/keyboard-not-working-after-loading-hdat2-screen-t207-15.html, you might need to enable legacy USB device support in your computer's BIOS to get the keyboard working here. Remember to disable it when you're done. – slm Jul 10 '13 at 07:44
  • Yes, it's always something. In BIOS, I have USB Controller, USB 2.0 Controller, USB Keyboard Support, USB Mouse Support, Legacy USB storage detect, and they are all Enabled. So where do I go from here? Buy a wired keyboard? Ha! – Samir Jul 10 '13 at 08:41
  • @Sammy - yeah I think that might be your only other option. I didn't see anything else except trying to unplug and re-plug in the keyboard after hdat2 has started. – slm Jul 10 '13 at 08:46
  • I got a wired keyboard now and it works. But the problem now is that it doesn't seem to recognize the disk drive. The menu shows FFD, PATA, PATA, PATA, USB and another USB. Why are my SATA drives recognized as PATA drives? The three PATAs sare actually SATAs. And the two USB drives are removable disk drives (memory card readers). So what now? – Samir Jul 10 '13 at 12:25
  • What am I supposed to do with HDAT anyway? What operation am I supposed to do? What's the name of the option? I played a little with the Demo option. Am I supposed to selected a drive, press Enter key to show menu, then Device tests menu, then Detect and fix bad sectors menu, and then "fix with verify/write/verify"? That's it? Scan and fix bad sectors? That's the purpose of using HDAT? – Samir Jul 10 '13 at 12:29
  • @Sammy - yes do the verify/write/verify. This will scan and attempt to fix bad sectors on the disk. – slm Jul 10 '13 at 12:33
  • As I expected then... but any idea why the drive is not detected? It's detected by BIOS. And it's detected by Windows. So I think it's safe to say that it's not dead, at least not yet... – Samir Jul 10 '13 at 12:49
  • I got it! HDAT recognazied the drive today and I was able to run the "verify/write/verify" option. It found like 10 or 12 (not sure) bad sectors/blocks. I rebooted into Ubuntu. I got the same type of errors but this time I used the fsck command and got several prompts to clear some bad errors, references, etc. and after doing this I rebooted. I was unable to do shutdown -r now command because it asked for username and password. I have forgotten my password. So I pressed ctrl+alt+del combo to terminate and reboot. – Samir Jul 11 '13 at 15:17
  • I then booted into recovery at Grub and did the passwd myname and chose 1234 as the new password. I booted into Ubuntu normally, got some crazy video output as if something is wrong with the graphics card, but after some more waiting I finally got to the graphical Ubuntu login page. I typed in my name and the new password 1234 but it said I need to differentiate between lower and upper case. There is no casing in 1234 so I don't know what this is. Is the user name also case sensitive???... – Samir Jul 11 '13 at 15:20
  • How do I get past the login page? Is there a difference between doing passwd myname and passwd Myname? I used the lower case but I think my username is actually typed with upper case at login. It's been a while since I used my Ubuntu disk so I don't remember. Also, is it true that I can use ls /home to find my username? At root? Previously when I did this I would get an error saying something that the /bin or some other folder is not included in the PATH environment variable, and it doesn't find the command or something. But now this should work. I mean when I go to recovery in Grub menu – Samir Jul 11 '13 at 15:24
  • After entering recovery in Grub menu I choose to "drop to root". That's where I used the passwd myname command. And it said the new password was set but I couldn't log in with it. Maybe it's not registered/saved??... now that I have come so far, can you help me with these last steps with login/user/pass? – Samir Jul 11 '13 at 15:26
  • Now when I enter recovery from Grub menu it asks me to "give root password for maintenance". I typed in "muasdf" which is a new password I have chosen by running passwd muasdf at root. Note that I forgot to type the username after passwd, so I think this is why it's asking about a "maintenance" password. – Samir Jul 11 '13 at 15:46
  • No worries now! I entered recovery from Grub menu again, provided root password, then used the passwd myname and typed in qwerty as the new password. I also chose to repair packages and fix x server (if any). I did shutdown -r now and then picked the normal mode from Grub menu. This time I didn't see any video/graphics artifacts during boot. I logged in successfully. But my resolution was limited to 1280 x 1024 or similar. Generic graphics driver loaded?... – Samir Jul 12 '13 at 10:11
  • By the way, when the hard drive eventually was detected and presented in the HDAT2 drive menu, the wireless Logitech keyboard also started to work. Weird... as if one depends on the other to work. The only think different I did was that I used the first option when booting to HDAT (load only SATA/PATA drivers) and I shut down the computer the day before (no suspend/sleep/hibernation). Shutting off completely and then starting up computer I think made the system initialize the hard drive properly. – Samir Jul 12 '13 at 10:22
  • @Sammy - glad it's working. – slm Jul 12 '13 at 11:18