Using a file descriptor in a system call

Question

Let's suppose that I want to use a file descriptor in a system call (the fd number would be provided via a parameter). What is to be expected if a user space program uses this system call? Where would the OS look for this specific fd? In the current process’s file descriptors or elsewhere?

Below, I tried to illustrate this.

+--------------+     +----++--------------+
| Kernel space |     | fd ||  User space  |
|              |     |list||              |
|   handler <---------------- syscall(fd) |
|              |     |    ||              |
+--------------+     +----++--------------+

Regarding your drawing: the fd list is on the kernel side. You can not create a fd out of the blue, it is given to you by the kernel, following another syscall (open, socket, pipe, dup, etc.). Thus, the kernel is always aware of the valid fds you can use. — xhienne, Mar 28 '17 at 16:43
@xhienne From the info from /proc/<pid>/fd I assumed that fds are not global, but instead each process has it's own set (table) of file descriptors. I am aware that the kernel creates the fds, after all this is done with a syscall. — Iulian Paun, Mar 29 '17 at 07:31
@Julian I didn't say they were global. Each process has its own fd set. I said the fd list part of your drawing was on the kernel side. — xhienne, Mar 29 '17 at 08:30
I don’t understand what you’re asking. If you assume that fds are not global, but instead each process has its own set (table) of file descriptors, then why do you ask “Where would the OS look for this specific fd? In the current process’s file descriptors or elsewhere?”? Where else would the kernel look other than the current process’s list of file descriptors? — G-Man Says 'Reinstate Monica', Mar 31 '17 at 02:45
Where else would the kernel look other than the current process’s list of file descriptors?

Well, I can't say that I know; just wanted to be sure. However, thanks to @lgeorget some things are clearer now. — Iulian Paun, Mar 31 '17 at 11:55

score 1 · Accepted Answer · answered Mar 29 '17 at 11:55

A file descriptor is an integer used to reference a file, among all files opened by a given process. Usually, this is implemented by kernels by considering the file descriptor as an index in a table.

The rest of my answer applies to Linux.

In Linux, each valid file descriptor is associated to a struct file. This structure contains a pointer to the inode (the file's data and metadata), the current position of the process in the file, a list of operations which are actually pointers to functions implemented by the file system the file lives in, etc.

To fetch the file structure from the file descriptor, the Linux kernel proceeds as follows. I take here the example of the read system call.

SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
{
    struct fd f = fdget_pos(fd);
    ssize_t ret = -EBADF;

    if (f.file) {
        loff_t pos = file_pos_read(f.file);
        ret = vfs_read(f.file, buf, count, &pos);
        if (ret >= 0)
            file_pos_write(f.file, pos);
        fdput_pos(f);
    }
    return ret;
}

The first operation is fdget_pos. It takes as parameter the file descriptor from the caller in userspace and fetches the corresponding file. It returns a struct fd defined as follows:

struct fd {
    struct file *file;
    unsigned int flags;
};

This is basically a struct file, with a couple flags to remember what operations will be necessary on putting back the structure.

Now, how does fdget_pos works. It's actually intricate in strange ways but it boils down to two basic operations (with more checks that I don't show here for simplicity):

The first one consists in fetching the process's files table. This table is available from a pointer in the caller process's structure (accessible through current):

struct files_struct *files = current->files;

The next operation consists in verifying the validity of the file descriptor:

if (fd < files->fdt->max_fds) // first of all, if the file descriptor is too big, then it cannot be valid
    return files->fdt->fd[fd]; // otherwise, we return the pointer stored in the table of file descriptors (may be NULL)
return NULL;

The pointer may be eliminated before the function returns (if one thread of the process does a read and another a close on the same file descriptor at the same time, for instance). The kernel takes care of this.

If the struct file pointer returned by fdget_pos is NULL, then it means that the file descriptor passed to the system call is invalid. In this case, the system call returns the error code EBADF ("bad file descriptor").

To sum up, file descriptors are just indexes in a per-process table of file descriptors. However, it's not sufficient to just dereference them, since the entry in the files table may be NULL. Furthermore, the kernel must do additional checks to handle race conditions on the file descriptor.

What do you mean, “a struct file … contains … a list of operations which are actually pointers to functions implemented by the file system the file lives in”? Are you referring to the fact that struct file contains that file descriptor’s level of access to the file? (E.g., even if the file is protected 777, a process can open it O_RDONLY, O_WRONLY, or O_RDWR, and the file structure keeps track of this.) If you’re referring to the fact that files are seekable, but most other things aren’t, and some devices support ioctl, etc., I believe that information is not in the file struct. — G-Man Says 'Reinstate Monica', Mar 31 '17 at 08:19
@G-Man Some operations are in the inode struct (http://lxr.free-electrons.com/source/include/linux/fs.h#L1685) while some are in the file struct (http://lxr.free-electrons.com/source/include/linux/fs.h#L1645), depending on which structure is needed to perform them. — lgeorget, Mar 31 '17 at 08:49
Of course, the virtual file system and the actual one will need to dereference the inode pointer from the file structure in order to perform most of the operations like reading and writing. But seeking for example is done only at the file level, the inode is only used to verify that the file is seekable or not. — lgeorget, Mar 31 '17 at 08:52
(well bad example, seeking also requires the size of the inode + the positions of holes and stuff...) — lgeorget, Mar 31 '17 at 09:06

Using a file descriptor in a system call

1 Answers1