The description in the open(2)
man page gives some clues to start with:
O_PATH (since Linux 2.6.39)
Obtain a file descriptor that can be used for two purposes:
to indicate a location in the filesystem tree and to per‐
form operations that act purely at the file descriptor
level. The file itself is not opened, and other file oper‐
ations (e.g., read(2), write(2), fchmod(2), fchown(2),
fgetxattr(2), ioctl(2), mmap(2)) fail with the error EBADF.
Sometimes, we don't want to open a file or a directory. Instead, we just want a reference to that filesystem object in order to perform certain operations (e.g., to fchdir()
to a directory referred to by a file descriptor that we opened using O_PATH
). So, a trivial point: if this is our purpose, then opening with O_PATH
should be a little cheaper, since the file itself is not actually opened.
And a less trivial point: before the existence of O_PATH
, the way of obtaining such a reference to a filesystem object was to open the object with O_RDONLY
. But the use of O_RDONLY
requires that we have read permission on the object. However, there are various use cases where we don't need to actually read the object: for example, executing a binary or accessing a directory (fchdir()
) or reaching through a directory to touch an object inside the directory.
Usage with "*at()" system calls
The common, but not the only, use of O_PATH
is to open a directory, in order to have a reference to that directory for use with the "*at" system calls, such as openat()
, fstatat()
, fchownat()
, and so on. This family of system calls, which we can roughly think of as the modern successors to the older system calls with similar names (open()
, fstat()
, fchown()
, and so on), serve a couple of purposes, the first of which you touch on when you ask "why do I want to use a file descriptor instead of the directory's path?". If we look further down in the open(2)
man page, we find this text (under a subheading with the rationale for the "*at" system calls):
First, openat() allows an application to avoid race conditions
that could occur when using open() to open files in directories
other than the current working directory. These race conditions
result from the fact that some component of the directory prefix
given to open() could be changed in parallel with the call to
open(). Suppose, for example, that we wish to create the file
path/to/xxx.dep if the file path/to/xxx exists. The problem is
that between the existence check and the file creation step, path
or to (which might be symbolic links) could be modified to point
to a different location. Such races can be avoided by opening a
file descriptor for the target directory, and then specifying that
file descriptor as the dirfd argument of (say) fstatat(2) and ope‐
nat().
To make this more concrete... Suppose we have a program that wants to perform multiple operations in a directory other than its current working directory, meaning that we must specify some directory prefix as part of the filenames we use. Suppose, for example, that the pathname is /dir1/dir2/file
and we want to perform two operations:
- Perform some check on
/dir1/dir2/file
(e.g., who owns the file, or what time was it last modified).
- If we are satisfied with the result of that check, perhaps we then want to do some other filesystem operation in the same directory, for example, creating a file called
/dir1/dir2/file.new
.
Now, first suppose we did everything using traditional pathname-based system calls:
struct stat stabuf;
stat("/dir1/dir2/file", &statbuf);
if ( /* Info returned in statbuf is to our liking */ ) {
fd = open("/dir1/dir2/file.new", O_CREAT | O_RDWR, 0600);
/* And then populate file referred to by fd */
}
Now, furthermore suppose that in the directory prefix /dir1/dir2
one of the components (say dir2
) was actually a symbolic link (that refers to a directory), and that between the call to stat()
and the call to open()
a malicious person was able to change the target of the symbolic link dir2
to point to a different directory. This is a classic time-of-check-time-of-use race condition. Our program checked a file in one directory but was then tricked into creating a file in a different directory -- perhaps a security-sensitive directory. The key point here is that the pathname /dir/dir2
looked the same, but what it refers changed completely.
We can avoid these sorts of problems using the "*at" calls. First of all, we obtain a handle referring to the directory where we will do our work:
dirfd = open("/dir/dir2", O_PATH);
The critical point here is that dirfd
is a stable reference to the directory that was referred to by the path /dir1/dir2
at the time of the open()
call. If the target of the symbolic link dir2
is subsequently changed, this will not affect what dirfd
refers to. Now, we can do our check + operation using the "*at" calls that are equivalent to the stat()
and open()
calls above:
fstatat(dirfd, ""file", &statbuf)
struct stat stabuf;
fstatat(dirfd, "file", &statbuf);
if ( /* Info returned in statbuf is to our liking */ ) {
fd = openat(dirfd, "file.new", O_CREAT | O_RDWR, 0600);
/* And then populate file referred to by fd */
}
During these steps any manipulation of symbolic links in the pathname /dir/dir2
will have no impact: the check (fstatat()
) and the operation (openat()
) are guaranteed to take place in the same directory.
There is another purpose to using the "*at()" calls, which relates to the idea of "per-thread current working directories" in multithreaded programs (and again we could open the directories using O_PATH
), but I think this use is probably less relevant to your question, and I leave you to read the open(2)
man page if you'd like to know more.
Usage with file descriptors for regular files
One usage of O_PATH
with regular files is to open a binary for which we have execute permission (but not necessarily read permission, so that we could not open the file with O_RDONLY
). That file descriptor can then be passed to fexecve(3)
to execute the program. All that fexecve(fd, argv, envp)
is doing with its fd
argument is essentially:
snprintf(buf, "/proc/self/fd/%d", fd);
execve(buf, argv, envp);
(Although, starting with glibc 2.27, the implementation will instead make use of the execveat(2)
system call, on kernels that provide that system call.)