30

I've been reading up about how pipes are implemented in the Linux kernel and wanted to validate my understanding. If I'm incorrect, the answer with the correct explanation will be selected.

  • Linux has a VFS called pipefs that is mounted in the kernel (not in user space)
  • pipefs has a single super block and is mounted at it's own root (pipe:), alongside /
  • pipefs cannot be viewed directly unlike most file systems
  • The entry to pipefs is via the pipe(2) syscall
  • The pipe(2) syscall used by shells for piping with the | operator (or manually from any other process) creates a new file in pipefs which behaves pretty much like a normal file
  • The file on the left hand side of the pipe operator has its stdout redirected to the temporary file created in pipefs
  • The file on the right hand side of the pipe operator has its stdin set to the file on pipefs
  • pipefs is stored in memory and through some kernel magic, shouldn't be paged

Is this explanation of how pipes (e.g. ls -la | less) function pretty much correct?

One thing I don't understand is how something like bash would set a process' stdin or stdout to the file descriptor returned by pipe(2). I haven't been able to find anything about that yet.

1 Answers1

25

Your analysis so far is generally correct. The way a shell might set the stdin of a process to a pipe descriptor could be (pseudocode):

pipe(p) // create a new pipe with two handles p[0] and p[1]
fork() // spawn a child process
    close(p[0]) // close the write end of the pipe in the child
    dup2(p[1], 0) // duplicate the pipe descriptor on top of fd 0 (stdin)
    close(p[1]) // close the other pipe descriptor
    exec() // run a new process with the new descriptors in place
Greg Hewgill
  • 7,053
  • 2
  • 32
  • 34
  • Thanks! Just curious why the dup2 call is needed, and you can't just directly assign the pipe descriptor to stdin? – Brandon - Free Palestine Aug 04 '14 at 23:42
  • 3
    The caller doesn't get to choose what the numeric value of the file descriptor is when it is created in pipe(). The dup2() call allows the caller to copy the file descriptor to a specific numeric value (needed because 0, 1, 2 are stdin, stdout, stderr). That is the kernel equivalent of "assigning directly to stdin". Note that the C runtime library global variable stdin is a FILE *, which is not kernel related (although it is initialised to be connected to descriptor 0). – Greg Hewgill Aug 04 '14 at 23:43
  • Great answer! I am a little lost in the details. Just wondering why you do close(p[1]) before running exec()? Once dup2 returns, wouldn't p[1] point to fd 0? Then close(p[1]) closes the file descriptor 0. Then how can we read from the stdin of the child process? – user1559897 Dec 07 '18 at 14:33
  • @user1559897: The dup2 call does not change p[1]. Instead, it makes the two handles p[1] and 0 point to the same kernel object (the pipe). Since the child process doesn't need two stdin handles (and wouldn't know what the numbered handle that is p[1] is anyway), p[1] is closed before exec. – Greg Hewgill Dec 07 '18 at 16:56
  • @GregHewgill Gotchu. Thx! – user1559897 Dec 07 '18 at 18:36
  • How does the shell decide when to terminate the processes on either end? Does it just kill both ends as soon as one of them finishes? (as it appeas to when I do yes|head -1 – Ben Sep 28 '20 at 00:17
  • 1
    @Ben: Look up the SIGPIPE signal. In your example, when head terminates, yes will receive a SIGPIPE, and the default handler terminates the process. – Greg Hewgill Sep 28 '20 at 00:34