10

For some unknown reasons, my BTRFS filesystem is corrupted. dmesg prints

BTRFS critical (device sda2): corrupt leaf, slot offset bad: block=43231330304,root=1, slot=47

(more than 1000x in the dmesg trace).

How to repair block #43231330304?

  • This link has a bit of good information on btrfs repairs. http://marc.merlins.org/perso/btrfs/post_2014-03-19_Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair.html – Brett Holman Nov 04 '19 at 20:15

5 Answers5

4

You should install smartmontools and run a long test (will take a while)

#smartctl -t long /dev/sd?

then it fails on the bad block

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       80%       682         1193046

so you have the LBA address of the block (1193046).

Then you install sg_utils and run with lba the lba address from above

# sg_verify --lba=1193046 /dev/sda

You will get a response like

# sg_verify --lba=1193046 /dev/sdb
verify (10):  Fixed format, current;  Sense key: Medium Error
 Additional sense: Unrecovered read error
  Info fld=0x123456 [1193046]
  Field replaceable unit code: 228
  Actual retry count: 0x008b
medium or hardware error, reported lba=0x123456

so you will know that this sector is really bad and could not been automatically put to the defective list of the micro controller of the disk.

you can check the defects list with

# sg_reassign --grown /dev/sda
>> Elements in grown defect list: 0

and if you reallocate this sector with

# sg_reassign --address=1193046 -v /dev/sda

and you check the grown defects list afterwards with

# sg_reassign --grown /dev/sdb
>> Elements in grown defect list: 1

you should see the counter grow by 1.

After this you should run

#smartctl -t long /dev/sd?

again and retry this procedure until the disk is clean and the long test runs without errors.

In this case I would use this disk for non-important stuff like a steam library or something like this. But I would replace the disk just to be sure. But for the moment the disk should be ok.

Paulo Tomé
  • 3,782
  • 1
    Note that in order to see the error report e.g. Completed: read failure (or Completed without error or whatever), you need to run a command like this after the allotted time: sudo smartctl -a /dev/sd? And I'll note that in my case, smartctl didn't find any error, but btrfs still seems to have a corruption. – nealmcb Jul 25 '21 at 00:27
1

If the problem comes from a hard-drive failure (e.g. a bad block), it is not repairable.

To check for bad blocks: badblocks -n /dev/sdX

To know the corrupted files, see How to list files part of a BTRFS block?

1

Please don't suggest running

btrfs check --repair

unless you are exactly sure what caused the problem and this should be the last option and at this point you should have a running backup in place.

The man page states

Warning: Do not use --repair unless you are advised to do so by a developer or an experienced user, and then only after having accepted that no fsck successfully repair all types of filesystem corruption. Eg. some other software or hardware bugs can fatally damage a volume.

Paulo Tomé
  • 3,782
0

To get information on a volume, btrfs device stats /MountPoint will give you plenty of hints on the state of the filesystem.

For an unmounted volume, btrfs check --repair /dev/TheDevice will check and repair the filesystem.

0

BTRFS developers recommend to contact them via IRC or linux-btrfs mailing list with any (relatively serious) issues with BTRFS according to BTRFS Wiki FAQ: I have a problem with my Btrfs filesystem!:

See the Problem FAQ for commonly-encountered problems and solutions.
If that page doesn't help you, try asking on IRC or the Btrfs mailing list.
Explicitly said: please report bugs and issues to the mailing list (you are not required to subscribe).

See Btrfs mailing list for details on how to post to the mailing list and what information to include when asking for help.

In my case, the simplest resolution of "corrupt leaf" errors was to simply delete affected files as they didn't contain anything important.

To find out which file are affected by the corrupted leafs:

btrfs inspect-internal logical-resolve 43231330304 <mountpoint>

Other general recommendations are to

  • backup data first;
  • run kernel 5.11 or newer (the newer is the better) (v5.11 introduced more sanity checks for written metadata);
  • use latest btrfs-progs (5.14 as of September 2021);
  • as mentioned by user2246514, do not use "btrfs check --repair" unless advised by a BTRFS developer. From btrfs-check(8) manual page:

Warning: Do not use --repair unless you are advised to do so by a developer or an experienced user, and then only after having accepted that no fsck successfully repair all types of filesystem corruption. Eg. some other software or hardware bugs can fatally damage a volume.

Andrey
  • 78