There are many layers involved here, and the result depends heavily on context. But some general info:
Firstly, the kernel. This is responsible for handling the actual keyboard hardware, and it registers keys via some mechanism (usually an interrupt handler, at the very bottom) and stores a reference to which key was pressed using an unambiguous representation of the particular key (usually a keycode of some kind).
Secondly, the consumers. Under Linux, there are basically two options here: either keyboard keys are fed into the TTY subsystem and turn into incoming bytes on a terminal device, if you're on a VT, or they're given to Xorg and passed on to applications via the X protocol, if you're in the GUI. The latter is the normal case nowadays, though the former is still well supported. Xorg and the X protocol grab raw key events from the kernel, then pass them on in similarly raw form to X applications — that is to say, that on the plain X level, applications just receive bare keypress/key release notifications, and it's the application's responsibility to impose semantics on them. This includes even such basic and near-universal things as the key marked 'A' producing the byte a
ordinarily, but A
when the shift key is held. Needless to say, there exist many libraries for doing most of this, and new applications usually don't need to concern themselves with it — but it is still work done on the application level.
Because the work is done on the application level, what handles things further is almost entirely application-dependent. A terminal emulator program takes X protocol keystrokes and duplicates the TTY subsystem's handling of them, turning them into bytes on a line, so that you can access a shell. Normal GUI programs use one toolkit or another, and the toolkits all provide keyboard handling as basic functionality, thus enabling the usual semantics for text boxes, selected elements, and so forth. Larger and more sophisticated programs, like a browser or a full-screen game, probably do their own keyboard handling on the X level, for the greater flexibility it affords. But it's all done by the individual application, and depends only on what the application decides.