0

This command:

badblocks -svn /dev/sda

What does it do? Does it just report the bad blocks? Or does it somehow handle the bad blocks so that I don't need to be worried about them?

I read the manual by man badblocks, but I don't get the -n option:


       -s     Show  the  progress  of the scan by writing out rough percentage completion of
              the current badblocks pass over the disk.  Note that badblocks may do multiple
              test  passes  over the disk, in particular if the -p or -w option is requested
              by the user.
   -v     Verbose mode.  Will write the number of read errors, write  errors  and  data-
          corruptions to stderr.


   -n     Use  non-destructive read-write mode.  By default only a non-destructive read-
          only test is done.  This option must not be combined with the  -w  option,  as
          they are mutually exclusive.

The output of running badblocks -svn /dev/sda which lasted for almost two days:

enter image description here

Update

Some posts suggest that after running badblocks -svn /dev/sda, the hard disk controller would take care of bad blocks. Not sure.

to have the hard disk controller replace bad blocks by spare blocks.

https://askubuntu.com/a/490552/507217

If you have fully processed your disk this way, the disk controller should have replaced all bad blocks by working ones and the reallocated count will be increased in the SMART log.

https://askubuntu.com/a/490549/507217

SMART

I checked the SMART table after running the badblocks command by:

smartctl --all /dev/sda

Note that Current_Pending_Sector raw value is 56. It's twice the 28 reported by badblocks. Maybe they are related.

Screenshot

Error interpretation

According to this:

How to interpret badblocks output

badblocks error log is in the form of reading/writing/comparing. In my case, all of 28 errors are reading errors. Meaning no application can read those blocks.

OS logs

I looked at OS logs by sudo journalctl -xe. Actually, SMART is throwing errors about those 56 bad sectors (28 bad blocks):

smartd[1243]: Device: /dev/sda [SAT], 56 Currently unreadable (pending) sectors

Log screenshot

Conclusion

I'd rather backup the data and replace the hard disk before it's too late.

Megidd
  • 1,549

1 Answers1

4

The "non-destructive read-write mode" triggered by the -n option writes the test data to each block, just like the -w, and forces the disk either to accept the write, to reallocate a faulty block, or to return a write error.

However, its big win is that it first reads the block it's about to overwrite, and re-writes that data after the test data has been written. This means that after badblocks has completed, the disk should contain the same data as it did before it started running.

Process

  1. Read block and save
  2. Write block of test data
  3. Capture status result and report if necessary
  4. Rewrite saved block
  5. Repeat with next block until done

Caveat

Writing a good block of data to a disk will result in expected operation: the block will be written. However, if the write fails, the disk firmware will automatically and transparently remap the block address to one of its spare blocks and retry the write for you at that new location on the disk. Provided that that write is successful you won't know anything different and the disk will seem perfectly normal. (In the SMART table, the Sector Reallocated counter will be increased by one.) Eventually as time progresses the set of spare blocks may get used up, and from this point disk writes that would have been remapped will simply fail.

A full disk write test such as one provided by badblocks with either -w or -n will force writes to all disk blocks, ensuring that they are all available to you, or else highlighting disk blocks that cannot be remapped.

Notice that badblocks does not guarantee you haven't lost data: if it cannot read a block it cannot rewrite it after the test, so it doesn't perform the write test (but does report the block as bad). If badblocks cannot read a block then neither would any other application have been able to do so, and your data is lost.

My recommendation would be that if you get any disk blocks that cannot be remapped you replace the disk as soon as possible because you no longer have any safety net. (Personally, I would replace such a disk before reaching this stage.) The ddrescue tool may help in copying data from this broken disk to a new one.

Stephen Kitt
  • 434,908
Chris Davies
  • 116,213
  • 16
  • 160
  • 287
  • So, it just reports the bad blocks and nothing more? Should I be worried about the bad blocks? How can I handle them, so that they no longer are a trouble-maker? – Megidd Dec 21 '21 at 10:00
  • When you said ...to reallocate a faulty block..., does it mean I don't need to be worried about the bad blocks anymore? They won't be a trouble-maker in the future, right? – Megidd Dec 21 '21 at 10:04
  • 1
    The real question is, what causes badblocks -svn to report a bad block? I haven’t checked, but I imagine that if it fails the read before the test write, then presumably nothing is written, so blocks which can’t be read aren’t reallocated either, and they will still cause trouble in the future. – Stephen Kitt Dec 21 '21 at 10:06
  • @StephenKitt Alright, that's bad news for me. – Megidd Dec 21 '21 at 10:09
  • 1
    @user3405291 I've added some tangential explanation for you. Hope this is useful – Chris Davies Dec 21 '21 at 10:48
  • 1
    @roaima Thanks. I'm going to backup the data and replace the hard disk before it's too late. – Megidd Dec 21 '21 at 11:15
  • Looking at the badblocks code, the first step is to read, and if the read fails, the test data isn’t written; so it’s not true that all blocks end up written. With the non-destructive test, a block will be flagged as bad if it fails to read, or if it fails the write test (write test data, read it back and compare). In the former case, the disk won’t have a chance to reallocate it (but that’s standard practice for failing disks — never overwrite data you care about, on the off-chance that a read will succeed at some point). – Stephen Kitt Dec 21 '21 at 11:47
  • 1
    @StephenKitt you can also have the situation where the read succeeds but the write fails, losing data that was otherwise just about viable. There's no easy way out at this point. But please do edit my answer if you think it's unclear – Chris Davies Dec 21 '21 at 12:06
  • 1
    @roaima yes, and when badblocks writes the original data back, failures are actually ignored AFAICT... – Stephen Kitt Dec 21 '21 at 12:15
  • Well, badblocks -n can't well write to blocks it can't read, since it wouldn't know what to write there... A rewrite with random data would turn a known error into just random invalid data with no way to identify it as such. If you decide to discard any data on the drive, you can still use the list of blocks printed to overwrite and reallocate just those. (In which case you'll hope you'd just done a destructive write test in the first place, since it'd have been faster. But you had no way of knowing.) – ilkkachu Dec 21 '21 at 12:28
  • 1
    @StephenKitt that's a nice edit thank you – Chris Davies Dec 21 '21 at 14:06