6

I have recently installed Gentoo Linux on my AMD Ryzen 7 1700X. Now I face segmentation faults during heavy compilation loads and also random restarts in an idle state.

As a very first step I verified the current microcode version:

grep -m 1 microcode /proc/cpuinfo
microcode       : 0x8001126

However, according to this table the latest microcode should be 0x08001129. It thus seems to be a good idea to update the microcode for the CPU.

So I emerged =sys-kernel/linux-firmware-20180730 (containing /lib/firmware/amd-ucode/microcode_amd_fam17h.bin). Further, I enabled the following options in the kernel:

CONFIG_MICROCODE=y
CONFIG_MICROCODE_AMD=y

After a reboot, I tried loading the microcode manually (late microcode update):

echo 1 > /sys/devices/system/cpu/microcode/reload

However, when I do this, no new line appears in dmesg:

dmesg | grep microcode
[    0.465121] microcode: CPU0: patch_level=0x08001126
[    0.465514] microcode: CPU1: patch_level=0x08001126
[    0.465932] microcode: CPU2: patch_level=0x08001126
[    0.466394] microcode: CPU3: patch_level=0x08001126
[    0.466772] microcode: CPU4: patch_level=0x08001126
[    0.467159] microcode: CPU5: patch_level=0x08001126
[    0.467537] microcode: CPU6: patch_level=0x08001126
[    0.467908] microcode: CPU7: patch_level=0x08001126
[    0.468268] microcode: CPU8: patch_level=0x08001126
[    0.468653] microcode: CPU9: patch_level=0x08001126
[    0.468999] microcode: CPU10: patch_level=0x08001126
[    0.469409] microcode: CPU11: patch_level=0x08001126
[    0.469744] microcode: CPU12: patch_level=0x08001126
[    0.470136] microcode: CPU13: patch_level=0x08001126
[    0.470455] microcode: CPU14: patch_level=0x08001126
[    0.470757] microcode: CPU15: patch_level=0x08001126
[    0.471092] microcode: Microcode Update Driver: v2.2.

I would expect something like microcode: CPU0: new patch_level=0x08001129. What am I missing here? Some kernel CONFIG_ option? Can I turn on some sort of debug information? Or even better – how can I list the microcode version provided in microcode_amd_fam17h.bin?

DaBler
  • 121
  • 9

1 Answers1

3

You could try something like this:

CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FIRMWARE_MEMMAP=y
CONFIG_MICROCODE=y
# CONFIG_MICROCODE_INTEL is not set
CONFIG_MICROCODE_AMD=y
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_FW_LOADER=y
CONFIG_EXTRA_FIRMWARE="amd-ucode/microcode_amd_fam17h.bin"
CONFIG_EXTRA_FIRMWARE_DIR="/lib/firmware"
# CONFIG_FW_LOADER_USER_HELPER is not set

(Note, if you want to list more than one file in CONFIG_EXTRA_FIRMWARE they should be space separated, and their paths should be relative to CONFIG_EXTRA_FIRMWARE_DIR.)

But that is possibly not going to work (it works for me if using graphics and network firmware only, haven't tried with CPU firmware), so, try another way: ignore the value for CONFIG_EXTRA_FIRMWARE above(ie. don't set it; but maybe the other ones are still needed, unsure) and instead try early microcode loading by prepending the CPU microcode file to the initramfs file, maybe something like this(in Gentoo):

/etc/kernel/postinst.d/25-glue_cpu_microcode_to_kernel:

#!/bin/bash

bootdir='/bewt'
initramfsfname="initramfs"
initramfs="$( realpath -- "/${bootdir}/${initramfsfname}" )"
vmlinuz="/${bootdir}/kernel"

prepend_microcode () {
  echo "prepending CPU microcode to ${initramfs}"
  local destfirst="/tmp/initrd/"
  local destmc="${destfirst}/kernel/x86/microcode/"
#  mkdir -p "${destmc}"
    install -dm644 "${destmc}"

  #this will replace the symlink /bewt/initramfs (on gentoo) with the file!
  #but this makes genkernel fail as such: 
  #ln: failed to create symbolic link 'initramfs.old' -> '': No such file or directory
  #even though it doesn't touch the .old file!
  # so to fix this, we'll use realpath above!

  ( cp -f "/lib/firmware/amd-ucode/microcode_amd.bin" "${destmc}/AuthenticAMD.bin" && cd "${destfirst}" && find . | cpio -o -H newc > "../ucode.cpio" 2>/dev/null && cd .. && cat "ucode.cpio" "${initramfs}" > "/tmp/${initramfsfname}" && chmod a-rwx "/tmp/${initramfsfname}" && mv -f "/tmp/${initramfsfname}" "${initramfs}" )
    local ec=$?
    if [[ $ec -eq 0 ]]; then
        echo "success."
    else
        #TODO: make errors be red so it's more obvious
        echo "failed!"
    fi
    return $ec
}

prepend_microcode

However genkernel might (still? 3 years later) ignore files in /etc/kernel/postinst.d/ (or that was only happening in 2015 and got fixed since, or maybe for some other reasons), which means you'll have to manually run genkernel yourself(to compile kernel) and then, after it, manually run all in scripts present in /etc/kernel/postinst.d/, doing so looks like this:

echo "!! Running genkernel..." time genkernel all --bootdir="/bewt" --install --symlink --no-splash --no-mountboot --makeopts="-j4 V=0" --no-keymap --lvm --no-mdadm --no-dmraid --no-zfs --no-multipath --no-iscsi --disklabel --luks --no-gpg --no-netboot --no-unionfs --no-firmware --no-integrated-initramfs --compress-initramfs --compress-initrd --compress-initramfs-type=best --loglevel=5 --color --no-mrproper --no-clean --no-postclear --oldconfig ec="$?" if test "$ec" -ne "0"; then echo "!! genkernel failed $ec" exit "$ec" fi echo "!! Done genkernel" list=( `find /etc/kernel/postinst.d -type f -executable | sort --general-numeric-sort` ) echo "!! Found executables: ${list[@]}" for i in ${list[@]}; do ec="-1" while test "0" -ne "$ec"; do echo "!! Executing: '$i'" time $i ec="$?" echo "!! Exit code: $ec" if test "$ec" -ne "0"; then echo "!! something went wrong, fix it then press Enter to retry executing '$i' or press C-c now." #exit $ec time read -p -s "!! Press Enter to re-execute that or C-c to cancel" fi done done

(note: the bootdir used above is /bewt instead of /boot, so you might want to change at least that; also the string microcode_amd.bin above should be replaced with yours: microcode_amd_fam17h.bin)
That list= and for above is not the proper way to handle file names, unless they have no spaces, newlines etc. which is obviously assumed above.

If you want to take a look at an old 4.1.7 kernel .config that did cpu-firmware early loading, see this one.

  • 1
    The kernel configuration exactly matches my setting. The early microcode loading also doesn't work. The last thing I didn't try is initrd. I will try it tomorrow... (perhaps?) However, I don't think this is the problem. Currently, I consider adding some debug printk directly to /usr/src/linux/arch/x86/kernel/cpu/microcode... – DaBler Sep 17 '18 at 17:02
  • @DaBler If I were to guess, and I might be wrong here, I'd say that this particular if block isn't entered; the reason seems to be that it's because the microcode revision in the CPU is already at least as high as the one on the disk. However I'm aware that this shouldn't be the case for you, according to all the info in OP. So maybe it's this other if that enters instead! What motherboard do you have? –  Sep 18 '18 at 06:04
  • Ignore my What motherboard do you have? question. I thought that CFG Lock in my BIOS had anything to do with updating microcode, but looking into it, it doesn't seem to; it's about C States and power management. –  Sep 18 '18 at 06:56
  • Please let me know what you find by adding those debug printk, I'm really curious if this if block gets hit. I really think there should be a message when the microcode update fails and it should state current and on-disk microcode revision numbers. (for inspiration see this and boris3of2.patch) –  Sep 18 '18 at 07:06
  • If you want you could try running spectre-meltdown-checker, for it says (to me on Intel) * CPU microcode is the latest known available version: YES (latest known version is 0x96 according to Intel Microcode Guidance, August 8 2018). Maybe you'll get a similar message(?) –  Sep 20 '18 at 09:22
  • Even the microcode stored in an initrd file did not help. I'll try to add some pr_infos as you suggested. – DaBler Sep 20 '18 at 11:45
  • The spectre-meltdown-checker says: CPU microcode is the latest known available version: UNKNOWN (latest microcode version for your CPU model is unknown) – DaBler Sep 20 '18 at 11:49
  • 1
    I have just added a lot of pr_info calls to arch/x86/kernel/cpu/microcode/… The call path for early builtin microcode loading goes through apply_microcode_early_amd, scan_containers to parse_container, but never hits this block… which may be the problem. Not sure anyway. The call path for find_microcode_in_initrd goes the same way. In the case of late loading, the path ends with the Not reloading previously-loaded already-in-effect microcode! (in the patch you linked). – DaBler Sep 20 '18 at 13:14
  • I am not sure what to do now. It would be fine to list the content of the microcode file. – DaBler Sep 20 '18 at 13:14
  • I've no idea how to list the content of the microcode file. On another note: assuming you already have SMT disabled, do you have this commit in your kernel ? What about this (non-SMT related) one ? –  Sep 20 '18 at 14:25
  • I have the first commit in my kernel (=sys-kernel/gentoo-sources-4.14.65). I however don't have the second one. (I can try it later.) I have added a lot of debug printings and this is what I currently see in dmesg|grep microcode. The /proc/cpuinfo still reports patch level 0x8001126. – DaBler Sep 21 '18 at 12:58
  • Looks like you should get patch 0x08001227 :D but due to cache_find_patch return NULL you don't, my guess. –  Sep 21 '18 at 15:22
  • 1
    Updated dmesg output It seems that the container contains patch for rev id 0x8012 whereas my cpu rev id is 0x8011. – DaBler Sep 21 '18 at 16:41