9

I have a BTRFS RAID-1 filesystem with 2 legs. One disk needs to be replaced because of re-occuring read errors.

Thus, the plan is:

  1. add a 3rd leg -> result should be: 3 way mirror
  2. remove the faulty disk -> result should be: 2 way mirror

Thus, I did following steps:

btrfs dev add /dev/new_device /mnt/foo
btrfs balance /mnt/foo

I assume that btrfs does the right thing, i.e. create a 3 way mirror.

The alternative would be to use a balance filter, I guess. But since the filesystem already is a RAID-1 one, that shouldn't be necessary?

I am a bit concerned because a btrfs fi show prints this:

Before balance start:

    Total devices 3 FS bytes used 2.15TiB
    devid    1 size 2.73TiB used 2.16TiB path /dev/left
    devid    2 size 2.73TiB used 2.16TiB path /dev/right
    devid    3 size 2.73TiB used 0.00B path /dev/new_device

During balancing:

    Total devices 3 FS bytes used 2.15TiB
    devid    1 size 2.73TiB used 1.93TiB path /dev/left
    devid    2 size 2.73TiB used 1.93TiB path /dev/right
    devid    3 size 2.73TiB used 458.03GiB path /dev/new_device

I mean, this looks like btrfs balances one half of the existing RAID-1 group to a single disk ... right?

Thus, my question, do I need to specify a balance filter to get a 3-way mirror?

PS: Does btrfs even support n-way mirrors? A note in the btrfs wiki says that it does not - but perhaps it is outdated? Oh boy, cks has a pretty recent article on the 2-way limit.

maxschlepzig
  • 57,532

3 Answers3

14

Currently, btrfs does not support n-way mirrors.

Btrfs does have a special replace subcommand:

btrfs replace start /dev/left /dev/new_device /mnt/foo

Reading between the lines of the btrfs-replace man page, this command should be able to use both existing legs - e.g. for situations where both legs have read errors - but both error sets are disjoint.

The btrfs replace command is executed in the background - you can check its status via the status subcommand, e.g.:

btrfs replace status /mnt/foo
45.4% done, 0 write errs, 0 uncorr. read errs

Alternatively, one can also add a device to raid-1 filesytem and then delete an existing leg:

btrfs dev add /dev/mapper/new_device /mnt/foo
btrfs dev delete /dev/mapper/right  /mnt/foo

The add should return fast, since it justs adds the device (issue a btrfs fi show to confirm).

The following delete should trigger a balancing between the remaining devices such that each extend is available on each remaining device. Thus, the command is potentially very long running. This method also works to deal with the situation described in the question.

In comparison with btrfs replace the add/delete cycle spams the syslog with low-level info messages. Also, it takes much longer to finish (e.g. 2-3 times longer, in my test system with 3 TB SATA drives, 80 % FS usage).

Finally, after the actual replacement, if the newer devices are larger than the original devices, you will need to issue a btrfs fi resize on each device to utilize the entire disk space available. For the replace example at the top, this looks like something like:

btrfs fi resize <devid>:max /mnt/foo

where devid stands for the device id which btrfs fi show returns.

kiko
  • 117
maxschlepzig
  • 57,532
  • +1 for mentioning the replace command. Removing the old drive and adding the new drive isn't really the same as replacing (and removes redundancy that might be necessary in case the first drive has issues too). – basic6 Nov 12 '15 at 12:11
  • 2
    I found a 'btrfs dev add ...' followed by a 'btrfs dev delete ...' took 5 days to complete for a relatively small (100Gb volume; 400 snapshots). I'd strongly recommend using the 'replace' - related links: https://lwn.net/Articles/524589/ and https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-replace – David Goodwin Mar 02 '16 at 11:08
  • Another disadvantage of the obsolete delete/add procedure is that it creates a devid "hole", i.e., the id of the deleted drive will not be used anymore. This might make it slighly less convenient to identify a drive, e.g., drive #4 will be the third drive in the array, which consists of drives #1, #2 and #4 after deleting #3. – basic6 Aug 08 '16 at 14:26
1

Using replace is the preferred solution, and 2-3x faster than balance. (device remove first rebalances. Perhaps it doesn't use the soft conversion type making it slower)

This answer prevents the failed disk from blocking kernel I/O.

I did the following:

  1. Ensured that the degraded filesystem was noauto in /etc/fstab
  2. Rebooted the machine (which took about 20 minutes due to I/O hangs)
  3. Disabled the LVM VG containing the btrfs fs on the failed drive:

    sudo vgchange -an <failed-vg>
    
  4. Disabled the failed device:

    echo 1 | sudo tee /sys/block/sdb/device/delete
    
  5. Mounted the filesystem -o rw,degraded (Note: degraded can only be used once)

  6. Got the failed devid from:

    btrfs filesystem show /mountpoint
    
  7. btrfs replace start -B <devid> /dev/new-disk /mountpoint
    

As I'm writing this:

  • replace status shows a healthy 0.1% progress every 30 seconds or so
  • iostat -d 1 -m <target-dev> shows about 145MB/s (Seagate advertises 160MB/s)
Tom Hale
  • 30,455
  • 1
    Hm, why would one use LVM below btrfs? Anyhow, my question didn't involve LVM, thus, I would say that this answer is a bit off-topic and that the link in the comments to your separate question is sufficient. – maxschlepzig Jan 26 '19 at 11:26
  • @maxschlepzig I want to be able to easily resize LVs and have some filesystems RAID1 while less important filesystems are single. Yes, you didn't state you were using LVM, but others may be. – Tom Hale Jan 26 '19 at 11:31
1

btrfs replace which is suggested, has one drawback - it does not allow to replace larger device with a smaller, even utilization is low. E.g. scenario of replacing 1TB HDD to 500GB SSD does not work and allow only add / remove option.

Only good thing, that writes to SSD are so fast, that remove is dependent only on HDD, going few GB per minute, 100% utilizing HDD drive on loaded system (seek time beats performance a lot, 1TB can take 1 day to remove).

Arunas Bart
  • 811
  • 6
  • 13
  • Yes, very annopying, but understandably given how replace works. Even worse (and without excuse) is that resizing the device to a smaller size does not help, you actually have to unmount, resize trhe partition, and remount, to be able to replace. – Remember Monica Nov 08 '23 at 22:54