Main stacks in Linux

Question

What are the main stacks in Linux? What I mean is, for example when an interrupt occurs what stack will be used for it, and what is the difference between user process and kernel process stacks?

This may be helpful: http://duartes.org/gustavo/blog/post/journey-to-the-stack/ — SnakeDoc, Dec 29 '15 at 18:28
usually, this is not the platform for asking homework sounding questions, but since you are a new member, I'd suggest reading this thread and making your own conclusions out of it : http://unix.stackexchange.com/questions/5788/how-is-an-interrupt-handled-in-linux — MelBurslan, Dec 29 '15 at 18:29
Thank you men, but you doesn't answer my question, sorry. At first, it isn't my homework. At second, I don't asked about how stacks works (local variables, call a subroutine, stack frame and so on) ... I am asking about what are the main stacks in Linux. What are the most important stacks that are used in Linux (or in any other modern OS). For example ... what stack uses interrupt routine? Where is stored? And how is it loaded into processor register (for example on x86 arch) on interrupt occurs? — Nik Novák, Dec 29 '15 at 19:11

Netch · Accepted Answer · 2023-03-03T07:51:15.567

This is highly platform-specific. Unless you bind to a certain platform (even difference between x86-32 and x86-64 is principal), one can't answer this. But, if to limit it to x86, according to your last comment, I could suggest some information.

There are two main styles of service request ("syscall") from user land to kernel land: interrupt-styled and sysenter-styled. (These terms are invented by me for this description.) Interrupt-styled requests are those that handled by processor exactly in the same manner as an external interrupt. In x86 protected mode, this is called using int 0x80 (newer) or lcall 7,0 (the oldest variant, SysV-compatible) and implemented using so-called gates (task gate, interrupt gate, etc.), configured as special segment descriptors. The task switching is executed by processor. During this switching, the old task registers, including stack pointer, are stored to old task TSS, and the new task registers, including stack pointer, are loaded from the new task TSS. In other words, all "usual" registers are stored and loaded (so this is very long action). (There is a separate issue with FPU/SSE/etc. state which change is postponed - see documentation for details.)

For handling such service requests, kernel prepares a separate stack for each thread (a.k.a. LWP - lightweight process), because a thread can be switched during any blockable function call. Such stack usually has small size (for example, 4KB).

As soon as x86 task switching always changes stack pointer, there is no chance to reuse userland stack for kernel. On the other hand, such reuse shall not be allowed at all (except small amount of the current thread data) because a user process page can be unsecure: another active thread can change or even unmap it. That's why it is simply prohibited to use userland stack for running in kernel, so, each thread shall have different stacks for its user and kernel land; this remains true for modern, sysenter-styled processing. (On the other hand, as already noted above, each thread shall have a stack for its kernel land another than of another thread.)

Sysenter-styled processing had been designed much later and implemented with SYSENTER and SYSCALL processor instructions. They differ in that they were not designed with keeping an old (too firm) restriction in mind, that system call shall keep all registers. Instead, they were designed more closer to a usual function call ABI which allows that a function can arbitrarily change some registers (in most ABIs, this is named "scratch" registers), only a few registers are changed and the care to keep old values is brought by handler routines. SYSENTER/SYSEXIT instruction pair (both for 32 and 64 bits) spoil old contents of RDX and RCX (in a weird manner - userland shall prefill them with proper values), and new RIP and RSP are loaded from respective MSRs, so, stack is switched to the kernel land one immediately. Opposed to this, SYSCALL/SYSRET (64 bit only) use RCX and R11 for return address and flags, and do not change stack by themselves. Later on, kernel utilizes part of this stack to save a few registers and then switches to own stack, because 1) there is no guarantee that userland stack is enough big to keep all needed values, and 2) for security reasons (see above). From this point, we have again a per-thread kernel stack.

Beside userland threads, there are many kernel-only threads (you can see them in ps output as names inside square brackets). Each such thread has its own stack. They implement 1) periodic routines, started on some event or timeout, 2) transient actions or 3) handle actions requested from real interrupt handlers. (For case 3 they named "bh" in old kernels, and "ksoftirqd" in newer ones.) Large part of these threads are attached to a single logical CPU. As soon as they have no user land, they have no user land stack.

External interrupt handlers are limited in Linux, AFAIK, to no more than one simultaneously executed handler for each logical CPU; during such handler execution, no IO interrupts are allowed. (NMIs are a terrible exception with bug-prone handling.) They come using task-switching interrupt gate and have got an own stack for each logical CPU, for the same reasons as described above.

As already noted, the most part of this is too x86-specific. Task switching with mandatory stack pointer replacing is rare to see at another architectures. For example, ARM32 has a single stack pointer per privilege level, so, if an external interrupt comes during kernel land, stack pointer is not changed.

Some details in this answer can be obsoleted due to high kernel development speed. Consider it only as a general suggestion and verify against the concrete version you will explore. For more description on x86 interrupt handling and task switching, please refer to "Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: System Programming Guide, Part 1" (freely available on Intel website).

vonbrand · Answer 2 · 2015-12-29T22:54:00.250

3

On Linux, when running in userspace, the userspace stack is used. If the process is running in kernelspace, it uses a kernel stack "owned" by the process. Interrupts are handled in-kernel.

Probably a better place to find out such details about the private parts of your favorite kernel is to rummage around in sites catering to people programming it, like e.g. kernelnewbies or search around in LWN's "kernel pages" for Linux. You should be able to find similar places for the BSDs or Solaris, and even MacOS. Windows information might be harder to come by...

Such information isn't discussed in typical operating system texts, you'd have to look for descriptions for developers.

edited Dec 29 '15 at 22:54

answered Dec 29 '15 at 21:21

vonbrand

18,253

Thank you for your answer. I can imagine the situation when an interrupt occurs and inside its handler enables further interrupt (nesting interrupts). So, what if stack size is small? What is occurs overflow? – Nik Novák Dec 29 '15 at 21:58
1

Handling of another external interrupt during a running one is not allowed, AFAIK, in Linux. This could be too resrictive for some cases but greatly simplifies design. Multiple interrupt handlers in parallel are allowed in SysV and BSD design (look for splbio, splnet, etc.). Anyway, interrupt handlers are designed in manner so they limit only to parts that can't be performed in softinterrupt state. There are "realtime" versions which allow multiple interrupt handlers; see my answer for generic pecularities which occur with such permission. – Netch Dec 30 '15 at 08:37
@Netch This sounds like a powerfull solution, BUT. I am afraid about that Linux uses nested interrupts... – Nik Novák Dec 30 '15 at 10:04
@Netch So I asked about that. Here is an link: http://stackoverflow.com/questions/34527763/linux-nested-interrupts?noredirect=1#comment56797629_34527763 – Nik Novák Dec 30 '15 at 11:25
AFAIK, nested interrupts of different IRQ lines are allowed. NOT the same IRQ! IOW, when handling interrupt 'n', interrupt 'n' will be turned Off, i.e., it is non-reentrant. Of course, top and bottom halves can still execute concurrently.. – kaiwan May 23 '16 at 07:21

Main stacks in Linux

2 Answers2