How does a keyboard press get processed in the Linux Kernel?

Question

I'm currently learning about the Linux Kernel and OSes in general, and while I have found many great resources concerning IRQs, Drivers, Scheduling and other important OS concepts, as well as keyboard-related resources, I am having a difficult time putting together a comprehensive overview of how the Linux Kernel handles a button press on a keyboard. I'm not trying to understand every single detail at this stage, but am rather trying to connect concepts, somewhat comprehensively.

I have the following scenario in mind:

I'm on a x64 machine with a single processor.
There're a couple of processes running, notably the Editor VIM (Process #1) and say LibreOffice (Process #2).
I'm inside VIM and press the a-key. However, the process that's currently running is Process #2 (with VIM being scheduled next).

This is how I imagine things to go down right now:

The keyboard, through a series of steps, generates an electrical signal (USB Protocol Encoding) that it sends down the USB wire.
The signal gets processed by a USB-Controller, and is send through PCI-e (and possibly other controllers / buses?) to the Interrupt Controller (APIC). The APIC triggers the INT Pin of the processor.
The processor switches to Kernel Mode and request an IRQ-Number from the APIC, which it uses as an offset into the Interrupt Descriptor Table Register (IDTR). A descriptor is obtained, that is then used to obtain the address of the interrupt handler routine. As I understand it, this interrupt handler was initially registered by the keyboard driver?
The interrupt handler routine (in this case a keyboard handler routine) is invoked.

This brings me to my main question: By which mechanism does the interrupt handler routine communicate the pressed key to the correct Process (Process #1)? Does it actually do that, or does it simply write the pressed key into a buffer (available through a char-device?), that is read-only to one process at a time (and currently "attached" to Process #1)? I don't understand at which time Process #1 receives the key. Does it process the data immediately, as the interrupt handler schedules the process immediately, or does it process the key data the next time that the scheduler schedules it?

When this handler returns (IRET), the context is switched back to the previously executing process (Process #2).

Related: https://unix.stackexchange.com/q/116629/117549 – Jeff Schaller Oct 05 '19 at 18:00 — Jeff Schaller, Oct 05 '19 at 18:00

score 9 · Accepted Answer · edited Apr 13 '20 at 05:29

Your understanding so far is correct, but you miss most of the complexity that's built on that. The processing in the kernel happens in several layers, and the keypress "bubbles up" through the layers.

The USB communication protocol itself is a lot more involved. The interrupt handler routine for USB handles this, and assembles a complete USB packet from multiple fragments, if necessary.

The key press uses the so-called HID ("Human interface device") protocol, which is built on top of USB. So the lower USB kernel layer detects that the complete message is a USB HID event, and passes it to the HID layer in the kernel.

The HID layer interprets this event according to the HID descriptor it has required from the device on initialization. It then passes the events to the input layer. A single HID event can generate multiple key press events.

The input layer uses kernel keyboard layout tables to map the scan code (position of the key on the keyboard) to a key code (like A) and interprets Shift, Alt, etc. The result of this interpretation is made available via /dev/input/event* to userland processes. You can use evtest to watch those events in real-time.

But processing is not finished here. The X Server (responsible for graphics) has a generic evdev driver that reads events from /dev/input/event* devices, and then maps them again according to a second set of keyboard layout tables (you can see those partly with xmodmap and fully via the XKBD extension). This is because the X server predates the kernel input layer, and in earlier times had drivers to handle mouse and PS/2 keys directly.

Then the X server sends a message to the X client (application) containing the keyboard event. You can see those messages with the xev application. LibreOffice will process this event directly, VIM will be running in an xterm which will process the event, and (you guessed it) again add some extra processing to it, and finally pass it to VIM via stdin.

Complicated enough?

Very interesting explanation. Where did you learn all of this? What is your background? — tritium_3, Apr 12 '20 at 09:22
@tritium_3: About 25 years of using Linux, curiosity, and the ability to google and read. And I learned about X before that ... — dirkt, Apr 12 '20 at 10:24
@dirkt what if multiple process made direct read system call to keyboard without use of X server logic, how does the kernel service all of them. Does it copy the bytes to all blocked processes? Or Does it return "NOT AVAILABLE" to all subsequent read system calls — Declan Nnadozie, Jun 01 '20 at 09:30
@DeclanNnadozie: Try that yourself: run evtest /dev/input/... in two terminals, with the path to your keyboard. Then try again with the --grab option. — dirkt, Jun 01 '20 at 11:47

score 3 · Answer 2 · answered Oct 05 '19 at 18:13

does it simply write the pressed key into a buffer (available through a char-device?)

yes, I should say.

And then there is a kind of cascade from (low level) console to tty (virtual) to pseudo-tty. A key press gets written to /dev/tty1 or /dev/tty5 depending on which "console" is active.

And in xterm (ps axf output):

  467 tty1     Ss     0:38  \_ -bash
 5820 tty1     S+     0:00      \_ xinit fvwm -- vt9
 5821 tty9     S<sl+  54:15          \_ /usr/lib/Xorg :0 vt9
 5831 tty1     S      0:00          \_ xterm -geometry +1+1 -n login fvwm
 5833 pts/0    Ss+    0:38              \_ fvwm
 ...
 ...
  773 pts/0    S      0:07                  \_ xterm
  775 pts/2    Ss+    0:00                  |   \_ bash
14452 pts/0    S      0:04                  \_ xterm
14454 pts/1    Ss     0:00                  |   \_ bash
14507 pts/1    S      0:00                  |       \_ xfontsel
31044 pts/1    R+     0:00                  |       \_ ps ax f
19549 pts/0    S      0:00                  \_ xterm
19551 pts/3    Ss+    0:00                      \_ bash

This shows how Xorg gets started on tty9 from tty1, and how fvwm (window manager) and xterm (terminal emulator) "take" /dev/pts/0, and it is the new shells that get /dev/pts/1, pts/2. pts/3 and so on.

Now, no matter if I activate that pid 19551 pts/3 bash process by pointing at it's xterm window, and then press a key, or if I do echo hello >/dev/pts/3 from a console vt like /dev/tty5, the characters go to the correct process.

man ps explains (well, it does list them) under PROCESS STATE CODES:

S    interruptible sleep (waiting for an event to complete)
s    is a session leader    
+    is in the foreground process group

I leave you with these keywords...

How does a keyboard press get processed in the Linux Kernel?

2 Answers2

Linked