2

Does the md subsystem output any messages (to syslog/systemd-journal) to indicate that it's running in a degraded state (or anything else that might indicate that it has successfully reacted to a drive failure, as hinted at here)?

For example, I see lots of errors from sd indicating things like Unrecovered read error but I don't see anything like "retried successfully on alternate". Maybe no news is good news?

Back in the day, mirroring software/hardware would generate syslog entries that indicated when a device was degraded or otherwise required attention. Does md not do that?

Background: the systems in question are already deployed and are being remotely monitored (via syslog/journald info, so no mdadm or any other interactive commands/access of any sort are available at this point).

jhfrontz
  • 359

1 Answers1

2

I set up a quick test on a RAID 1 array built from two loop devices.

dd bs=1M count=100 if=/dev/zero >/tmp/0.img
cp /tmp/0.img /tmp/1.img
i0=$(losetup --show --find /tmp/0.img); echo $i0
i1=$(losetup --show --find /tmp/1.img); echo $i1
mdadm --create /dev/md99 --metadata default --level 1 --raid-devices 2 $i0 $i1

Setting one half faulty

mdadm --manage /dev/md99 --set-faulty $i1    # For me, $i1=/dev/loop1

gives me this from the kernel (amongst other related RAID1 messages)

Oct 6 17:36:10 pi kernel: [4087450.030438] md/raid1:md99: Disk failure on loop1, disabling device
Oct 6 17:36:10 pi kernel: [4087450.030438] md/raid1:md99: Operation continuing on 1 devices.
Chris Davies
  • 116,213
  • 16
  • 160
  • 287