After encountering numerous issues with gigabyte's solution, I gave up and invented a completely other-than solution.
Problems:
- The filesystem needs to be aligned
- fsck.msdos doesn't actually handle failed writes to the FAT on the SSD (due to power loss mid write).
- The mdraid1.0 trick only seems to work if you don't use write intents, which is a bad idea.
- The linux kernel doesn't know about ordered writes; writing to /boot/efi can still disable boot if the power goes out in the middle. Too bad; the DOS kernel got it right (by accident).
My solution to the final problem was ultimately to give up on actually mirroring and keep the second SSD with a backup copy of EFI synced from a known good state after upgrades. I assembled a full solution from pieces.
I only have the necessary tools available in 16 bit assembly; so this solution is about as crazy as it looks. I regret that I have no better, but quite frankly I'm not going to port the tools to x64 for reputation on stackexchange. I needed the 16 bit tools anyway for old DOS games I have on archive. What we have here is FreeDOS doing maintenance work on a modern system; which is both fascinating and horrifying at the same time.
You will need:
- 8086tiny https://github.com/ecm-pushbx/8086tiny
(Dosbox-x works just as well if you have a working SDL console; but you will also need SHUTDOWN.EXE)
FreeDOS (actually included with 8086tiny so no more packages)
My SSDFMT, which actually emits an aligned filesystem
My flushbuf
, because no emulator I could lay hands on actually forces writes through to the disk. I could easily patch 8086tiny to do so, but this is expedient. flushbuf
just opens its argument and calls fdatasync
on it.
begin-base64 755 /boot/flushbuf
f0VMRgIBAQMAAAAAAAAAAAIAPgABAAAAeAAgAAAAAABAAAAAAAAAAHgAAAAA
AAAAAAAAAEAAOAABAAAAAAAAAAEAAAAFAAAAAAAAAAAAAAAAACAAAAAAAAAA
AAAAAAAA/gAAAAAAAAD+AAAAAAAAAAAQAAAAAAAAWEiD+AJ1QV9fSDH2uAIA
AAAPBUiFwHwJSJe4SwAAAA8F99h0G1C/AgAAAEiNNCX3ACAAugcAAAC4AQAA
AA8FWJe4PAAAAA8FvwIAAABIjTQl4AAgALoXAAAAuAEAAAAPBb8OAAAA69lV
c2FnZTogZmx1c2hidWYgZGV2aWNlCkVycm9yIQo=
====
(Yes, that's an entire Linux x64 binary.)
- The ability to boot removable media in case something goes wrong.
Setting up
This answer is written using /dev/sda1
and /dev/sda2
as devices. If you don't have such a system where the identities of /dev/sda
and /dev/sdb
are stable, you must use /dev/disk/by-id/...
values instead. /dev/disk/by-uuid
won't work. Trust me in this. You should also write scripts for absolutely everything because typing /dev/disk/by-id/...
devices every time is a pill. I found the best place to keep your scripts is in /boot
.
Get SSDFMT.COM onto 8086tiny's fd.img
(mtools
or mount -o loop
will work
As root do
(cd /boot/efi && tar -zcf /boot/efi-bak.tgz *)
umount /boot/efi
STTY_SAVE=`stty -g`
stty cols 80 rows 25
./8086tiny bios.bin fd.img hd.img
SSDFMT
In SSDFMT, select disk 1 (the only disk visible), your actual sector size on your SSD, and select compatibility 5. Force LBA won't work with 8086tiny.
In order to avoid a wedged console later we need to install FreeDOS on the hard disk.
QUITEMU
./8086tiny bios.bin fd.img hd.img
SYS C:
XCOPY /e *.* C:\
QUITEMU
/boot/flushbuf /dev/sda1
stty `$STTY_SAVE`
- This installs, but
CONFIG.SYS
and AUTOEXEC.BAT
still point to files on the floppy. Let's fix that.
mount /boot/efi
sed -i 's/A:\\/C:\\/g' /boot/efi/CONFIG.SYS
sed -i 's/A:\\/C:\\/g' /boot/efi/AUTOEXEC.BAT
- Time to put EFI back:
(cd /boot/efi && tar -zxf /boot/efi-bak.tgz)
- Create the script to sync the mirror after upgrading and rebooting (to verify the boot works)
#!/bin/sh -x
umount /boot/efi
/boot/flushbuf /dev/sda1 # replace sda1 and sdb1 with your devices
cat /dev/sda1 > /dev/sdb1
/boot/flushbuf /dev/sdb1
mount /boot/efi
- Create the reverse script (for when the filesystem cannot be repaired)
#!/bin/sh -x
[ -d /boot/efi/EFI ] && umount /boot/efi # probably won't be mounted
/boot/flushbuf /dev/sda2 # replace sda1 and sdb1 with your devices
cat /dev/sdb1 > /dev/sda1
/boot/flushbuf /dev/sda1
mount /boot/efi
- Run the script give in step 7
Note that the second disk isn't fully registered to boot as an EFI disk yet. I'm pretty sure I fixed this in BIOS setup, not from my OS.
Should fsck.msdos
find errors on boot, it might not be able to fix them. If it finds nontrivial errors, the torn write fixer from SSDFMT needs to run first. How? By booting the EFI partition in the emulator (!).
The launcher script should be saved:
#!/bin/sh -x
umount /boot/efi # making sure
./8086tiny bios.bin /dev/null @/dev/sda1
fsck.msdos /dev/sda1
/boot/flushbuf /dev/sda1
mount /boot/efi
Note that running this script leaves you at a DOS prompt after performing the torn write check/repair pass. You can run QUITEMU to return. I'm pretty sure there's honest differences in opinion on whether or not to run CHDKSK
inside the emulator or not. I don't have enough experience to know which is better.
Bonus: I discovered you can get faster boots if you defragment the EFI partition after debian major updates (as far as I can tell it's the boot logo that causes this). You can grep DEFRAG.EXE from ibiblio and copy it into your EFI partition and run it later.
cp -a
orrsync
or some other method that recurses any sub-directories). unmount /boot/efi and then add /dev/sda4 to the raid-1 with sdb4. This will cause sda4 to be synced with the contents of sdb4. Unmount this raid-1 mirror and remount it as /boot/efi (and don't forget to update/etc/fstab
so that it mounts the mirror device instead of /dev/sda4 - use a LABEL or UUID instead of a /dev/ entry). – cas Apr 08 '21 at 13:05update-grub
can & will copy it's own boot-loader there, but can't do anything about any others that might have been installed by the bios or other programs or operating systems. Easiest to just copy everything from sda4 to the mirror, it's only a few hundred MB at most, anyway. – cas Apr 08 '21 at 13:12