9

I ran an apt-get upgrade and an apt-get dist-upgrade on a new update notified today for Debian 12.

The last one is failing with this message, and can see later that it involves NVidia driver (I use the one of the Debian distribution) compilation:

dkms: autoinstall for kernel: 6.1.0-18-amd64 failed!
run-parts: /etc/kernel/postinst.d/dkms exited with return code 11
sudo apt-get dist-upgrade
Lecture des listes de paquets... Fait
Construction de l'arbre des dépendances... Fait
Lecture des informations d'état... Fait      
Calcul de la mise à jour... Fait
Les NOUVEAUX paquets suivants seront installés :
  libllvm16 linux-headers-6.1.0-18-amd64 linux-headers-6.1.0-18-common linux-image-6.1.0-18-amd64
Les paquets suivants seront mis à jour :
  linux-headers-amd64 linux-image-amd64 postgresql-14
3 mis à jour, 4 nouvellement installés, 0 à enlever et 0 non mis à jour.
Il est nécessaire de prendre 0 o/119 Mo dans les archives.
Après cette opération, 593 Mo d'espace disque supplémentaires seront utilisés.
Souhaitez-vous continuer ? [O/n] O
Lecture des fichiers de modifications (« changelog »)... Terminé
Préconfiguration des paquets...
Sélection du paquet libllvm16:amd64 précédemment désélectionné.
(Lecture de la base de données... 822688 fichiers et répertoires déjà installés.)
Préparation du dépaquetage de .../0-libllvm16_1%3a16.0.6-15~deb12u1_amd64.deb ...
Dépaquetage de libllvm16:amd64 (1:16.0.6-15~deb12u1) ...
Sélection du paquet linux-headers-6.1.0-18-common précédemment désélectionné.
Préparation du dépaquetage de .../1-linux-headers-6.1.0-18-common_6.1.76-1_all.deb ...
Dépaquetage de linux-headers-6.1.0-18-common (6.1.76-1) ...
Sélection du paquet linux-headers-6.1.0-18-amd64 précédemment désélectionné.
Préparation du dépaquetage de .../2-linux-headers-6.1.0-18-amd64_6.1.76-1_amd64.deb ...
Dépaquetage de linux-headers-6.1.0-18-amd64 (6.1.76-1) ...
Préparation du dépaquetage de .../3-linux-headers-amd64_6.1.76-1_amd64.deb ...
Dépaquetage de linux-headers-amd64 (6.1.76-1) sur (6.1.69-1) ...
Sélection du paquet linux-image-6.1.0-18-amd64 précédemment désélectionné.
Préparation du dépaquetage de .../4-linux-image-6.1.0-18-amd64_6.1.76-1_amd64.deb ...
Dépaquetage de linux-image-6.1.0-18-amd64 (6.1.76-1) ...
Préparation du dépaquetage de .../5-linux-image-amd64_6.1.76-1_amd64.deb ...
Dépaquetage de linux-image-amd64 (6.1.76-1) sur (6.1.69-1) ...
Préparation du dépaquetage de .../6-postgresql-14_14.11-1.pgdg120+1_amd64.deb ...
Dépaquetage de postgresql-14 (14.11-1.pgdg120+1) sur (14.10-1.pgdg120+1) ...
Paramétrage de linux-image-6.1.0-18-amd64 (6.1.76-1) ...
I: /vmlinuz.old is now a symlink to boot/vmlinuz-6.1.0-17-amd64
I: /initrd.img.old is now a symlink to boot/initrd.img-6.1.0-17-amd64
I: /vmlinuz is now a symlink to boot/vmlinuz-6.1.0-18-amd64
I: /initrd.img is now a symlink to boot/initrd.img-6.1.0-18-amd64
/etc/kernel/postinst.d/dkms:
dkms: running auto installation service for kernel 6.1.0-18-amd64.
Sign command: /usr/lib/linux-kbuild-6.1/scripts/sign-file
Signing key: /var/lib/dkms/mok.key
Public certificate (MOK): /var/lib/dkms/mok.pub

Building module: Cleaning build area... env NV_VERBOSE=1 make -j32 modules KERNEL_UNAME=6.1.0-18-amd64........(bad exit status: 2) Error! Bad return status for module build on kernel: 6.1.0-18-amd64 (x86_64) Consult /var/lib/dkms/nvidia-current/525.147.05/build/make.log for more information. Error! One or more modules failed to install during autoinstall. Refer to previous errors for more information. dkms: autoinstall for kernel: 6.1.0-18-amd64 failed! run-parts: /etc/kernel/postinst.d/dkms exited with return code 11 dpkg: erreur de traitement du paquet linux-image-6.1.0-18-amd64 (--configure) : le sous-processus paquet linux-image-6.1.0-18-amd64 script post-installation installé a renvoyé un état de sortie d'erreur 1 dpkg: des problèmes de dépendances empêchent la configuration de linux-image-amd64 : linux-image-amd64 dépend de linux-image-6.1.0-18-amd64 (= 6.1.76-1); cependant : Le paquet linux-image-6.1.0-18-amd64 n'est pas encore configuré.

dpkg: erreur de traitement du paquet linux-image-amd64 (--configure) : problèmes de dépendances - laissé non configuré Paramétrage de libllvm16:amd64 (1:16.0.6-15~deb12u1) ... Paramétrage de linux-headers-6.1.0-18-common (6.1.76-1) ... Paramétrage de postgresql-14 (14.11-1.pgdg120+1) ... Paramétrage de linux-headers-6.1.0-18-amd64 (6.1.76-1) ... /etc/kernel/header_postinst.d/dkms: dkms: running auto installation service for kernel 6.1.0-18-amd64. Sign command: /usr/lib/linux-kbuild-6.1/scripts/sign-file Signing key: /var/lib/dkms/mok.key Public certificate (MOK): /var/lib/dkms/mok.pub

Building module: Cleaning build area... env NV_VERBOSE=1 make -j32 modules KERNEL_UNAME=6.1.0-18-amd64........(bad exit status: 2) Error! Bad return status for module build on kernel: 6.1.0-18-amd64 (x86_64) Consult /var/lib/dkms/nvidia-current/525.147.05/build/make.log for more information. Error! One or more modules failed to install during autoinstall. Refer to previous errors for more information. dkms: autoinstall for kernel: 6.1.0-18-amd64 failed! run-parts: /etc/kernel/header_postinst.d/dkms exited with return code 11 Failed to process /etc/kernel/header_postinst.d at /var/lib/dpkg/info/linux-headers-6.1.0-18-amd64.postinst line 11. dpkg: erreur de traitement du paquet linux-headers-6.1.0-18-amd64 (--configure) : le sous-processus paquet linux-headers-6.1.0-18-amd64 script post-installation installé a renvoyé un état de sortie d'erreur 1 dpkg: des problèmes de dépendances empêchent la configuration de linux-headers-amd64 : linux-headers-amd64 dépend de linux-headers-6.1.0-18-amd64 (= 6.1.76-1); cependant : Le paquet linux-headers-6.1.0-18-amd64 n'est pas encore configuré.

dpkg: erreur de traitement du paquet linux-headers-amd64 (--configure) : problèmes de dépendances - laissé non configuré Traitement des actions différées (« triggers ») pour postgresql-common (257.pgdg120+1) ... Building PostgreSQL dictionaries from installed myspell/hunspell packages... en_us fr Removing obsolete dictionary files: Traitement des actions différées (« triggers ») pour libc-bin (2.36-9+deb12u4) ... Des erreurs ont été rencontrées pendant l'exécution : linux-image-6.1.0-18-amd64 linux-image-amd64 linux-headers-6.1.0-18-amd64 linux-headers-amd64 E: Sub-process /usr/bin/dpkg returned an error code (1)

Looking about what it complains, I did a cat on the log file it points, and found:
a NVidia card driver compilation problem:

  ld -m elf_x86_64 -z noexecstack --no-warn-rwx-segments   -r -o /var/lib/dkms/nvidia-current/525.147.05/build/nvidia-uvm.o @/var/lib/dkms/nvidia-current/525.147.05/build/nvidia-uvm.mod 
  {   echo /var/lib/dkms/nvidia-current/525.147.05/build/nvidia.ko;   echo /var/lib/dkms/nvidia-current/525.147.05/build/nvidia-uvm.ko;   echo /var/lib/dkms/nvidia-current/525.147.05/build/nvidia-modeset.ko;   echo /var/lib/dkms/nvidia-current/525.147.05/build/nvidia-drm.ko;   echo /var/lib/dkms/nvidia-current/525.147.05/build/nvidia-peermem.ko; :; } > /var/lib/dkms/nvidia-current/525.147.05/build/modules.order
sh /usr/src/linux-headers-6.1.0-18-common/scripts/modules-check.sh /var/lib/dkms/nvidia-current/525.147.05/build/modules.order
make -f /usr/src/linux-headers-6.1.0-18-common/scripts/Makefile.modpost
   sed 's/ko$/o/'  /var/lib/dkms/nvidia-current/525.147.05/build/modules.order | scripts/mod/modpost -m     -o /var/lib/dkms/nvidia-current/525.147.05/build/Module.symvers -e -i Module.symvers -T - 
ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol '__rcu_read_lock'
ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol '__rcu_read_unlock'
make[3]: *** [/usr/src/linux-headers-6.1.0-18-common/scripts/Makefile.modpost:126 : /var/lib/dkms/nvidia-current/525.147.05/build/Module.symvers] Erreur 1
make[2]: *** [/usr/src/linux-headers-6.1.0-18-common/Makefile:1991 : modpost] Erreur 2
make[2] : on quitte le répertoire « /usr/src/linux-headers-6.1.0-18-amd64 »
make[1]: *** [Makefile:250 : __sub-make] Erreur 2
make[1] : on quitte le répertoire « /usr/src/linux-headers-6.1.0-18-common »
make: *** [Makefile:82 : modules] Erreur 2

What should I do, from here?

Am I in danger if I reboot my computer now ?
Isn't it in the middle, between 6.1.0-17 and 6.1.0-18?

3 Answers3

7

I suppose you are using the nvidia drivers from the debian distribution. The nvidia drivers aren't yet compatible with the new linux kernel.

Someone wrote a patch for 3 files in the source code of the driver kernel module. I haven't tested this patch because my apt-get is broken, so I can't install the driver.

But here is the way to proceed.

First file: /usr/src/nvidia-current-525.147.05/common/inc/nv-linux.h

At line 2000 add the following lines:

#if defined(CONFIG_HAVE_ARCH_PFN_VALID) || LINUX_VERSION_CODE < KERNEL_VERSION(6,1,76)
#  define nv_pfn_valid pfn_valid
#else
/* pre-6.1.76 kernel pfn_valid version without GPL rcu_read_lock/unlock() */
static inline int nv_pfn_valid(unsigned long pfn)
{
        struct mem_section *ms;
    if (PHYS_PFN(PFN_PHYS(pfn)) != pfn)
            return 0;

    if (pfn_to_section_nr(pfn) &gt;= NR_MEM_SECTIONS)
            return 0;

    ms = __pfn_to_section(pfn);
    if (!valid_section(ms))
            return 0;

    return early_section(ms) || pfn_section_valid(ms, pfn);

} #endif

Second file: /usr/src/nvidia-current-525.147.05/nvidia/nv-mmap.c

At line 578, replace pfn_valid by nv_pfn_valid

Third file: /usr/src/nvidia-current-525.147.05/nvidia/os-mlock.c

At line 116, replace pfn_valid by nv_pfn_valid

At line 190, replace pfn_valid by nv_pfn_valid

Source of the patch, originally written for 470.223.02, so the line numbers differ from the ones I said: link

Bertrand125
  • 1,058
  • Yes, I'm using the driver of the Debian distribution. I might keep my computer turned on, waiting for NVidia updates to come by doing apt-get update few times a day, and avoiding to reboot it, while it is unstable. If it is unstable yet, in the state it is? I upvote your response because it's surely a good solution, but I confess that I fear applying it. – Marc Le Bihan Feb 11 '24 at 10:41
  • I do also fear applying it in the case an eventual error in this patch may damage the graphic hardware. About your system in its actual state, I don't know if it is unstable, but the old nvidia kernel module should be already loaded in memory, as well as the old kernel 6.1.0-17. If the kernel 6.1.0-17 is still installed, you may uninstall the kernel 6.1.0-18, and it should rebuild the nvidia kernel module for the old kernel. So you will be able to turn off the computer, or reboot. – Bertrand125 Feb 11 '24 at 10:55
  • Is normal thing to send an update when a driver that have chances to be used by more than 10% (I think) of people in its bundled Debian distribution isn't ready? Debian team checks the distribution, it isn't working! "No matter! We are sending it whatever!". Or is it in the flow of the failed upgrades that we encounter these last months, something that is going wrong in the Debian team? A lack of rigor? Lack of testing? – Marc Le Bihan Feb 11 '24 at 11:05
  • I don't know what happened in Debian team, maybe a lack of testing, probably because there are a lot of things to test in the system, and a lot of hardware possible combinations for computers. About my last comment, you should be able to roll back to the 6.1.0-17 with the following command: apt-get remove linux-image-6.1.0-18-amd64, and if needed reinstall the previous kernel: apt-get install linux-image-6.1.0-17-amd64. – Bertrand125 Feb 11 '24 at 11:16
  • 2
    @MarcLeBihan https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1063675 "[ Tests ] Only module building can be (and has been) tested. Everything else requires use of nvidia hardware and the driver." (answers your comment). "* Apply pfn_valid patch from gentoo to fix kernel module build for Linux 6.1.76, 6.6.15, 6.7.3, 6.8." – A.B Feb 11 '24 at 12:33
  • @A.B. Yes, it's a lack of rigor. Few around me have encountered the problem with the today release, and we are Sunday. On Monday, when computers will start updating automatically, fail, and fail their reboot, they will pass this bug from urgency=medium to high. "Only module building can be (and has been) tested.", he wrote. And obviously, the module doesn't build. I own one of the most common video card and plenty like me will have the trouble. They did not try an installation of their linux-image-6.1.0-18-amd64 on different computers before sending it over the Internet? Incredible. – Marc Le Bihan Feb 11 '24 at 15:39
  • 1
    @MarcLeBihan The problem is not with the hardware, but with the proprietary driver. As most Debian maintainers have an ethical problem with non-free software, I am not surprised it is not tested very well. – sadfasdfasdfddd Feb 12 '24 at 14:41
  • 1
    Ran into the same issue and found this thread the other day. Been watching the bug report, and just keeping my system on. It looks like it's "fixed" in unstable, but there was no mention of when this fix would be pushed to stable. So am I right in assuming that basically everyone running nvidia-driver just has to not reboot until this eventually shows up in stable? I really don't want my system to go down, and to scramble to figure out a way to fix it, as I have work to do daily. What an absolute disaster! Why would they push this kernel in the first place? Then not address it immediately! – KFish Feb 14 '24 at 19:51
  • Thank you. It worked for me – ASRodrigo Feb 16 '24 at 12:36
  • 1
    @KFish The severity of "grave" they set for that bug is the last one before the highest "critical". In their comments, one wrote that it's the first time for years that Debian is so broken. They have a big testing problem, and I hope they will discuss about that and improve the quality of their releases. – Marc Le Bihan Feb 16 '24 at 14:11
  • In response to all these comments, and as a proficient user who uses Gentoo as a daily driver, when not gaming/Steaming, I can say that borrowing patches or patchsets between distributions is very common. The reason the driver blew up with regard to Debian has to do with the fact that Debian prefers security and stability balanced with ease of use etc. Gentoo as a rolling release on the other hand is very agile, as there is no release schedule. It's a rolling release. There are developers at Debian who feel your pain. See ReleaseProposals – eyoung100 Feb 16 '24 at 20:42
  • Furthermore as seen in the Driver Archives, 525.147.xx was released on 31-OCT-2023. to coincide with the release of Kernel version 6.6 (See Table). That same table states that version 6.1 was released on 11-DEC-2022. If proper versioning was followed when the Debian tree was frozen, the NVIDA driver frozen in the repository should be 525.60, released 28-NOV-2022. By my math the driver you're trying to install is 87 point releases ahead of what's proper – eyoung100 Feb 16 '24 at 21:06
  • Well it looks like they pushed out the update, and I updated my system, did a reinstall of linux-image-6.1.0-18-amd64 to trigger the kernel module built again, rebooted and all is well. Great success! – KFish Feb 17 '24 at 00:53
  • @eyoung100 That's contradictory. if 'The reason the driver blew up with regard to Debian has to do with the fact that Debian prefers security and stability balanced with ease of use etc.", why are they publishing a release without testing it correctly? If you look at the bug story (start of February), the negligent maintainer noticed himself that the driver was causing a trouble, but no problem he wrote: they weren't releasing at that time. They aren't watching for stability → plenty of computers were hit by kernel panic, correct testings would have shown this. It's abnormal. – Marc Le Bihan Feb 17 '24 at 08:24
  • @MarcLeBihan You missed the point of that whole comment: Debian does prefer security and stability balanced with ease of use. The point of my comment was that the nvidia-driver that was released recently into the main repository was not properly tested. 525.47.xx was released around the time Version 6.6 of the Linux Kernel was released. The last release that should be in the main repository should be 525.60.xx. My question raised by that comment is: Why, for people on NVIDIA, when releasing a driver can you not release the updated kernel? Which again is Debian's choice – eyoung100 Feb 19 '24 at 21:14
5

This has now been fixed in Bookworm, see the announcement for details. Ensure that bookworm-updates is present in your repository configuration (/etc/apt/sources.list):

deb https://deb.debian.org/debian bookworm-updates main contrib non-free non-free-firmware

(The announcement doesn’t mention contrib, non-free, and non-free-firmware, but they are necessary in this instance.)

Then run apt update and apt upgrade as root, as usual.

Stephen Kitt
  • 434,908
1

There is a patched driver available now via the proposed-updates package archive.

To get the preliminary update, add the following to your /etc/apt/sources.list and then run sudo apt update. Once that is finished you should be able to upgrade the nvidia-driver-* packages which address this problem.

deb https://ftp.debian.org/debian/ bookworm-proposed-updates contrib main non-free non-free-firmware
  • Thanks a lot! What is are the good moves with a bookworm-proposed-updates source? 1. add it to deb, 2. do an apt-get update nvidia-driver-* (if such command exist in this form) to only get from propositions that driver, 3. apt-get upgrade nvidia-driver-* 4. remove the bookworm-proposed-updates source from deb? Isn't it unsafe to keep bookworm-proposed-updates source registered after patching? – Marc Le Bihan Feb 15 '24 at 17:13
  • 1
    @MarcLeBihan Please see my answer here and here for a detailed answer to your above comment. There's no harm in keeping the Proposed Updates Repository Enabled. If you need to install the patched nvidia-driver again before the update is moved to the main repo, apt will complain that you've installed an orphaned package – eyoung100 Feb 15 '24 at 22:39
  • @eyoung100 no, apt won’t complain. – Stephen Kitt Feb 17 '24 at 13:05
  • @StephenKitt: The PMS doesn't complain if an installed package exists on a system where the repository was removed? My apologies... – eyoung100 Feb 19 '24 at 21:17