TL;DR:
Yes. But run scrub
afterwards.
The long version:
This LWN article gives the commit text for replace
and says:
It is safe to crash or lose power during the operation, the
process resumes with the next mount.
I rebooted with my replace
at about 5% complete because btrfs replace on RAID1 was super slow with the failed disk present.
After replace
resumed and completed, I noticed that btrfs device usage /mountpoint
was showing some Data,DUP
and Metadata,single
, rather than only RAID1
. This was likely due to btrfs
writing DUP
as it couldn't write a 2nd copy to the failed drive. I rebalanced to make everything RAID1:
btrfs balance start -dconvert=raid1,soft -mconvert=raid1,soft /mountpoint
Now all data was showing as RAID1
, I thought I'd just check all was ok:
btrfs scrub start -Bd /mountpoint
I'm glad I did, as I'm got lots of csum errors
at about 151GiB scrubbed on id 2
(the newly replaced device):
scrub device /dev/mapper/vg4TBd3-ark (id 1) status
scrub started at Mon Jan 28 20:47:33 2019, running for 00:41:40
total bytes scrubbed: 153.53GiB with 0 errors
scrub device /dev/mapper/vg6TBd1-ark (id 2) status
scrub started at Mon Jan 28 20:47:33 2019, running for 00:41:40
total bytes scrubbed: 151.49GiB with 174840 errors
error details: csum=174840
corrected errors: 174837, uncorrectable errors: 0, unverified errors: 0
scrub
was crawling at this point. The logs showed many lines like:
BTRFS warning (device dm-5): checksum error at logical 3425803567104 on dev /dev/mapper/vg6TBd1-ark, physical 162136981504, root 5, inode 3367374, offset 0, length 4096, links 1 (path: HDDs/Quantum LM30/Linux1/home/tn/uts.old/etc/root/home/tn/build/linux-2.4.13-ac8/linux/include/net/sock.h)
BTRFS error (device dm-5): bdev /dev/mapper/vg6TBd1-ark errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
BTRFS error (device dm-5): fixed up error at logical 3425803567104 on dev /dev/mapper/vg6TBd1-ark
scrub_handle_errored_block: 806 callbacks suppressed
I ended up with a total of 262016 corrected errors, when I next checked at 184GiB scrubbed (it was zooming along again happily at this point).
I didn't receive a single error after that, meaning that all the errors were concentrated about the 151GiB point.
151GiB is roughly 5% of my total 2.88TiB, the point at which I restarted.
Perhaps it was just co-incidence, but I'm glad I ran scrub
regardless.
replace
, see btrfs replace on RAID1 is super slow with failed disk present – Tom Hale Jan 27 '19 at 05:40