I'm runnnig Debian Stable (Debian 12, Bookworm) on a new Dell PowerEdge R760xd2 server with 24 disks and 256 GByte of RAM. The initial installation (including a reboot into the newly installed OS) worked fine, but now grub fails to start.
error: no such device: [some UUID].
Loading Linux 6.1.0-17-amd64
error: out of memory.
Loading initial ramdisk ...
error: you need to load the kernel first.
As you can see, grub is unable to load the kernel, unrelated to possible ramdisk (initrd) issues.
I also observed:
- "Welcome to GRUB!" takes around a minute
- when I remove a (virtual) bootable CD while this happens, I see error messages related to several disk
ls (hd22,gpt1)/
givesout of memory
(in the recovery console)- enabling/disabling safe boot does not change any of this
- with a bootable image (grml) in the virtual CD drive, data is read from the device while "Welcome to GRUB!" is shown: 297 MByte for an image of size 493 MByte. With the CD available, the "Welcome to GRUB!" phase takes much longer
I'm using UEFI and added a 500 MByte UEFI partition (using Debian's installer). The boot device is a hardware RAID1 using two of the disks.
In between the previous successful reboot and the failure I configured ZFS on 22 of the 24 disks. Furthermore, the remaining storage from the boot RAID1 is now also used as a second zpool (ZFS). I think each of the 22 disks has two (GPT?) partitions, but I don't know why as I use the whole disks for ZFS.
My gut feeling is that grub scans all disks and is a bit overwhelmed by the sheer number of disks/partitions.
How can I get the system to boot again?
lsmod
check loaded modules andrmmod
remove them. Not sure if that would free up memory, if it's like the zfs module using it... – frostschutz Jan 18 '24 at 22:34boot
flag set, on the FIRST DISK ONLY. I believe the UUID's are assigned to individual zpools, not the actual disks. – eyoung100 Jan 18 '24 at 22:59