3

Why isn't the Linux module API backward compatible? I'm frustrated to find updated drivers after updating the Linux kernel.

I have a wireless adapter that needs a proprietary driver, but the manufacturer has discontinued this device about 7 years ago. As the code is very old and was written for Linux 2.6.0.0, it doesn't compile with the latest Linux kernels. I have used many Linux distributions but the same problem is everywhere. Although there is an open-source driver distributed with Linux kernel, it doesn't work. Some people are trying to modify the old proprietary code to make it compatible with the latest Linux kernels, but when a new Linux kernel is released, it takes months to make code compatible with that. Within that time, another new version is released. For this reason, I can't upgrade to a new Linux kernel; sometimes I can't even upgrade my distribution.

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255

2 Answers2

9

Although I've contributed a few (very minor) patches to the Linux kernel, I don't count myself as a kernel developer. However, here's what I know:


A driver written for kernel version 2.6.0.0 pre-dates the elimination of the Big Kernel Lock (BKL) that happened in kernel version 2.6.39.

The BKL was created back when Linux was still a single-processor (single-core, single-thread) OS. As soon as SMP support was added, the developers recognized that the BKL was going to become a big bottleneck at some point, but as long as there was just a few cores/threads in the system in total, it was somewhat tolerable. But it first became a real problem for people using Linux in supercomputers, and so the work began to replace everything that needed the BKL with more fine-grained locking mechanisms, or whenever possible, with lockless methods.

On modern computers, which may have double-digit numbers of cores on regular desktops and high-power laptops, let alone servers, a 2.6.0 backward-compatible kernel module API would also need to implement BKL.

If a legacy module says "I want to take the BKL", the rest of the kernel has no clue what the module is planning to do with it, and so the backward-compatibility mechanism would have to take all the locks that replaced the BKL just to cover all the possibilities. That would be a big performance hit. And the new lockless methods would also need to check for the legacy lock - which defeats the point of being lockless in the first place. So the very existence of the backwards compatibility mechanism would degrade system performance, even if no legacy modules were actually loaded.


More recently, the Spectre/Meltdown security patches made big changes in what needs to happen when the kernel/userspace boundary is being crossed. Any modules compiled before the Spectre/Meltdown fixes were implemented can be unreliable with post-Specre/Meltdown kernels.

Just two weeks ago I was troubleshooting an old server that needed manual power-cycling when security updates were applied by automation. This had happened several times before, and was reproducible. I found out that it had a very old version of the proprietary megasr storage driver from before the Spectre/Meltdown patches, which was not included in the automatic updates. After updating the driver to the current version, the problem went away. By the way, this was on a plain RHEL 6.10 system.

I've also seen servers crashing when loading proprietary pre-Spectre/Meltdown hardware monitoring drivers with a post-Spectre/Meltdown kernel. Based on this experience, I'm fully convinced that the Spectre/Meltdown fixes need to be treated as a watershed event: the kernel and the modules need to be either all before-fixes, or all after-fixes versions; mixing and matching will only lead to grief and midnight wake-up calls for the on-call sysadmin.

And since Spectre was a CPU design level issue, it is "a gift that keeps on giving": some people will find novel ways to exploit the weaknesses, and then the kernel developers will need to figure out ways to block the exploits.


These are just two of the big problems a 2.6.0.0-compatible legacy kernel module API would need to solve. I'm sure there are many others.


And then there is the more philosophical side. Think about it: what makes Linux possible?

A big part of it is open hardware specifications. If the hardware specifications are open, anyone can participate. As the source code of the operating system is open, anyone can contribute, for the benefit of everyone. And you cannot keep hardware programming specifications as your trade secret if your driver code is open-sourced.

Linux kernel developers tend to believe in the open source model. That is why they have made their design and development choices so that the preferred way for the hardware manufacturer to participate is to open-source the driver, get it merged into the main kernel source distribution, and then (and only then) you'll get to benefit of the entire kernel developer community in maintaining it.

This provides some incentive to hardware designers and manufacturers to make this possible. If you have something you wish to keep secret, make the effort encapsulate it into an ASIC, or perhaps into signed firmware if you must. (If you do the latter, please grant others the permission to redistribute the firmware package.)

But since the kernel is open source, the kernel developers cannot exactly prevent others from maintaining proprietary drivers separately. But they have no incentive to care about them either.

In fact, the extra hassle caused by proprietary binary drivers in kernel debugging is an incentive for kernel developer to not care about proprietary driver development: "They make my work more difficult, why I should do anything in particular to make theirs any easier?"

So, the kernel developers generally do what is the most advantageous for them as a group/community. If that includes some module API change, so be it. The third-party drivers don't even enter the equation.

telcoM
  • 96,466
  • Thank you for the history tidbits but you describe individual major kernel API changes when in fact minor compatibility-breaking API changes are introduced many times every year and you don't really answer the question as to why they are so frequent and disruptive. It's not impossible to maintain driver API compatibility for at least three to five years like it's done in proprietary OS'es. In terms of user-space it's even more depressing: most 32bit applications from Windows 95 still work in Windows 2010 25 years after they were written. Linux is nowhere near that level. – Artem S. Tashkinov Aug 19 '20 at 14:29
  • @ArtemS.Tashkinov You are right. Linux API changes too often. – Akib Azmain Turja Aug 19 '20 at 14:56
  • 1
    Oh, and this is not an answer either. It's a historical perspective on some changes. Meanwhile Microsoft managed to add transient execution CPU vulnerabilities protections in their kernels while preserving driver APIs/ABIs across Windows 7/8/10 which means it's not impossible. Oh, and Google has done the same for the Android Linux kernel fork. Oh, and even RedHat has done the same as well. – Artem S. Tashkinov Aug 19 '20 at 15:00
  • @ArtemS.Tashkinov Will ever Linux API be stable? – Akib Azmain Turja Aug 19 '20 at 15:07
  • The Linux kernel APIs - never. Keep an eye on Google Fuchsia/Zircon - it's going to be an open source OS/kernel with stable APIs/ABIs. Google really wants to get rid of the Linux kernel to solve Android update issues once and for all. – Artem S. Tashkinov Aug 19 '20 at 15:10
  • 1
    @ArtemS.Tashkinov: What does user space have to do with anything? I thought Linus refuses to break use space, ie. software compiled for Linux back in 1991 (that don't depend on user space libraries) should still work. And isn't the 2.6 kernel a heck of a lot older than 3 to 5 years. – Oskar Skog Aug 19 '20 at 17:48
  • Statically compiled binaries are a way to go, right. What other brilliant ideas do you have? – Artem S. Tashkinov Aug 19 '20 at 18:41
  • 3
    +1 What I love about this answer is how it practically shows why an internal model change will ultimately force an API to change irrespective of the desires of the developers. It also highlights the point that the biggest bugbear in open source API changes is the friction between open source and closed proprietary code. – Philip Couling Aug 19 '20 at 19:51
8

Greg Kroah-Hartman has written on this topic here: https://www.kernel.org/doc/html/v4.10/process/stable-api-nonsense.html

Besides some technical details regarding compiling C code he draws out a couple of basic software engineering issues that make their decision.


Linux Kernel is always a work in progress. This happens for many reasons:

  • New requirements come along. People want their software to do more, that's why most of us upgrade, we want the latest and greatest features. These can require rework to the existing software.
  • Bugs are found which need fixing, sometimes bugs are with the design itself and cannot be fixed without significant rework
  • New ideas and idioms in the software world happen and people find much easier / elegant / efficient ways to do things.

This is true of most software, and any software that is not maintained will die a slow and painful death. What you are asking is why doesn't that old unmaintained code still work?

Why aren't old interfaces maintained?

To ensure backward compatibility would require that old (often "broken" and insecure) interfaces are maintained. Of course it's theoretically possible to do this except it does carry significant cost.

Greg Kroah-Hartman writes

If Linux had to ensure that it will preserve a stable source interface, a new interface would have been created, and the older, broken one would have had to be maintained over time, leading to extra work for the [developers]. Since all Linux [developers] do their work on their own time, asking programmers to do extra work for no gain, for free, is not a possibility.

Even though Linux is open source, there is still only limited developer time to maintain it. So manpower can still be discussed in terms of "cost". The developers have to chose how they spend their time:

  • Spend a lot of time maintaining old / broken / slow / insecure interfaces. This can sometimes be double to triple the time it took to write the interface in the fist instance.
  • Thow away the old interfaces and expect other software maintainers to [do their job and] maintain their own software.

On balance, binning interfaces is really cost-effective (for the kernel developers). If you want to know why developers don't spend months and years of their life saving others from paying $10 for a new wifi adaptor... that's the reason. Remember that's time/cost effective for the kernel developers, not necessarily cost-effective for you or manufacturers.