Syscalls: How does a user processs pass/receive data to/from the kernel?

Question

The relationship between user and kernel virtual addresses has been discussed in a few questions before (links below), but as far as I understand it, the user process cannot read nor write to the Kernel addresses.

So, how does a user process share and receive data from the kernel?

Is it through memory? If so, where in the memory layout? Maybe CPU registers?

Related questions:

What's the use of having a kernel part in the virtual memory space of Linux processes?
Do the virtual address spaces of all the processes have the same content in their "Kernel" parts?

You will get more technical answers than this: but all that is needed is for the kernel to read and write user process space. Every system call provides user-mode addresses and sizes for the kernel to access. Consider the kernel exec's a user processes in the first place. It has write access to the whole of user space during that process (including being able to initialise memory that is read-only for the process itself). — Paul_Pedant, Jul 07 '20 at 19:34

score 2 · Accepted Answer · answered Jul 07 '20 at 20:43

2

Let's consider an example: a simple x86 Hello World program for Linux, that prints a message to stdout and exits. It needs to pass a couple of data items to the kernel:

Text string to output,
Exit code.

Here's the assembly code (to be compiled with FASM):

format ELF executable
segment readable executable
; system call numbers
SYS_EXIT=1
SYS_WRITE=4
; file descriptors
STDOUT=1
entry $
start:
    mov eax, SYS_WRITE
    mov ebx, STDOUT
    mov ecx, message
    mov edx, messageLength
    int 0x80
mov eax, SYS_EXIT
xor ebx, ebx ; exit code 0
int 0x80


message:
    db "Hello, world!",0xa
messageLength=$-message

All this program does to fulfill its main goal (message output) is

Set appropriate CPU registers to the values representing the system call number (for sys_write syscall), file descriptor (stdout), address of the message and the message length
Do the system call, in this example by means of software interrupt 0x80

Similar sequence is to exit: set the registers to the system call number and exit code, and do the system call.

Which registers to set to which values is defined by the system call calling convention.

After the kernel starts executing the syscall handler, this handler reads the values of the registers from the application's context and interprets them according to the calling convention. In particular, when it sees that system call is sys_write, it takes the length and address of the message, and uses them to read from the user space memory. Then these data (along with file descriptor) are passed to the drivers that will do the actual work.

answered Jul 07 '20 at 20:43

Ruslan

3,370

Thanks. Is it actually doing a software interrupt (software IRQ)? Is that the same as throwing an exception? (are they one and the same thing?). Also, any particular reason why the type of system call is defined on a register but the actual contents of the sys call in user space memory? e.g. why not have everything in user space memory? Is that because the precise location in user space memory is not fixed or established for every sys call? – Josh Jul 08 '20 at 04:32
@Josh it is actually a software interrupt, see the documentation for the INT x86 instruction. It's not the same as exception (this word also has multiple meanings: e.g. CPU exception, C++ exception, which one did you mean?). Also, it's one of the syscall mechanisms. On i686 also the sysenter mechanism is available. Where the parameters of the syscall are passed is defined by the calling convention. In most cases, the parameters—i.e. the values you pass to the libc wrapper function like write(2)—are passed through registers, along with the syscall number. The pointers just point to memory – Ruslan Jul 08 '20 at 06:43
Thanks @Rusian - By exception I mean a CPU exception. Like a stack overflow exception, e.g. when the CPU pushes data to the process stack and the OS may need to grow the stack, or a page fault exception when a page in virtual memory isn't available in physical memory. I thought these things are called CPU exceptions and are different from a software IRQ, but I could be wrong. – Josh Jul 08 '20 at 12:43
1

@Josh CPU exceptions are faults that happen (usually) unintentionally, due to some kind of violation (invalid access, division by zero, invalid instruction etc.). Software interrupt (not an IRQ!) is a deliberate call from the program code to an interrupt handler (ISR). It can result in execution of the same ISR that would handle an IRQ mapped to this vector (if any is), or a fault handler, but they are all separate. Don't mix an IRQ and a software interrupt: although to an ISR they will look very similar, an IRQ is a hardware event, while a software interrupt is a software instruction. – Ruslan Jul 08 '20 at 12:55
Thanks @Ruslan. Got it! – Josh Jul 08 '20 at 13:05
Hi @Ruslan great answer! Just have a question, I've learned that syscall incurs context switch, user process is replaced with kernel process, in your example hello world data is stored in user memory space, how can kernel process access it? Thanks a lot! – mzoz Aug 26 '20 at 14:47
@mzoz the process isn't replaced with kernel process. It's interrupted, and the context of the process—i.e. set of values of registers—is saved into RAM, and corresponding kernel values are loaded from their place in the RAM. And to access user data, the kernel can set up its page table so that user pages can be accessed. – Ruslan Aug 26 '20 at 14:59
@Ruslan much appreciated! – mzoz Aug 26 '20 at 15:17
@mzoz on Stack Exchange appreciation is usually expressed by voting. This reduces the amount of comments and gives some reputation to the post author ;) – Ruslan Aug 26 '20 at 15:22
@Ruslan sorry my bad, totally forgot that.. just upvoted cheers! :) – mzoz Aug 26 '20 at 23:57

score 2 · Answer 2 · answered Jul 07 '20 at 20:46

the user process cannot read nor write to the Kernel addresses

No, but the kernel can read and write the user addresses, if/when it wants to. Linux system calls pass the system call number and arguments in CPU registers. (Look up something like "Linux system call calling convention".)

Some of those arguments may be pointers, in which case the kernel knows to look for the data in the pointed-to location in the user address space. As far as I've understood, the kernel actually copies the data to kernel space before using it. (otherwise another user-space thread might modify the data during a system call.) But the location of the data could be anywhere in the user address space, as required by the program.

Syscalls: How does a user processs pass/receive data to/from the kernel?

2 Answers2