43

Brian Kernighan explains in this video the early Bell Labs attraction to small languages/programs being based on memory limitations

A big machine would be 64 k-bytes--K, not M or G--and so that meant any individual program could not be very big, and so there was a natural tendency to write small programs, and then the pipe mechanism, basically input output redirection, made it possible to link one program to another.

But I don't understand how this could limit memory usage considering the fact that the data has to be stored in RAM to transmit between programs.

From Wikipedia:

In most Unix-like systems, all processes of a pipeline are started at the same time [emphasis mine], with their streams appropriately connected, and managed by the scheduler together with all other processes running on the machine. An important aspect of this, setting Unix pipes apart from other pipe implementations, is the concept of buffering: for example a sending program may produce 5000 bytes per second, and a receiving program may only be able to accept 100 bytes per second, but no data is lost. Instead, the output of the sending program is held in the buffer. When the receiving program is ready to read data, then next program in the pipeline reads from the buffer. In Linux, the size of the buffer is 65536 bytes (64KB). An open source third-party filter called bfr is available to provide larger buffers if required.

This confuses me even more, as this completely defeats the purpose of small programs (though they would be modular up to a certain scale).

The only thing I can think of as a solution to my first question (the memory limitations being problematic dependent upon the size data) would be that large data sets simply weren't computed back then and the real problem pipelines were meant to solve was the amount of memory required by the programs themselves. But given the bolded text in the Wikipedia quote, even this confuses me: as one program is not implemented at a time.

All this would make a great deal of sense if temp files were used, but it's my understanding that pipes do not write to disk (unless swap is used).

Example:

sed 'simplesubstitution' file | sort | uniq > file2

It's clear to me that sed is reading in the file and spitting it out on a line by line basis. But sort, as BK states in the linked video, is a full stop, so the all of the data has to be read into memory (or does it?), then it's passed on to uniq, which (to my mind) would be a one-line-at-a-time program. But between the first and second pipe, all the data has to be in memory, no?

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
mas
  • 1,909
  • 2
  • 18
  • 32

3 Answers3

55

The data doesn’t need to be stored in RAM. Pipes block their writers if the readers aren’t there or can’t keep up; under Linux (and most other implementations, I imagine) there’s some buffering but that’s not required. As mentioned by mtraceur and JdeBP (see the latter’s answer), early versions of Unix buffered pipes to disk, and this is how they helped limit memory usage: a processing pipeline could be split up into small programs, each of which would process some data, within the limits of the disk buffers. Small programs take less memory, and the use of pipes meant that processing could be serialised: the first program would run, fill its output buffer, be suspended, then the second program would be scheduled, process the buffer, etc. Modern systems are orders of magnitude larger than the early Unix systems, and can run many pipes in parallel; but for huge amounts of data you’d still see a similar effect (and variants of this kind of technique are used for “big data” processing).

In your example,

sed 'simplesubstitution' file | sort | uniq > file2

sed reads data from file as necessary, then writes it as long as sort is ready to read it; if sort isn’t ready, the write blocks. The data does indeed live in memory eventually, but that’s specific to sort, and sort is prepared to deal with any issues (it will use temporary files it the amount of data to sort is too large).

You can see the blocking behaviour by running

strace seq 1000000 -1 1 | (sleep 120; sort -n)

This produces a fair amount of data and pipes it to a process which isn’t ready to read anything for the first two minutes. You’ll see a number of write operations go through, but very quickly seq will stop and wait for the two minutes to elapse, blocked by the kernel (the write system call waits).

Zombo
  • 1
  • 5
  • 44
  • 63
Stephen Kitt
  • 434,908
  • Where is the data in the mean time? Is it on the disk? I'm a bit fuzzy on this. I'll give an example for clarity. – mas Jun 20 '18 at 13:51
  • 14
    This answer could benefit from additionally explaining of why splitting programs into many small ones saves memory use: A program had to be able to fit in-memory to run, but only the currently running program. Every other program was swapped out to disk in early Unix, with only one program even swapped into actual RAM at a time. So the CPU would run one program, which would write to a pipe (which back then was on disk), swap out that program and swap in the one that read from the pipe. Elegant way to turn a logically parallel assembly line into incremental serialized execution. – mtraceur Jun 20 '18 at 17:06
  • @mtraceur If that's the case, what does the bolded quote from Wikipedia mean? – mas Jun 20 '18 at 17:25
  • 6
    @malan: Multiple processes can be started and can be in a runnable state at the same time. But at most one process can be executing on each physical CPU at any given time, and it's the job of the kernel's process scheduler to allocate "slices" of CPU time to each runnable process. In modern systems, a process that is runnable but not currently scheduled a CPU timeslice usually remains resident in memory while it's waiting for its next slice, but the kernel is allowed to page the memory of any process out to disk and back into memory again as it finds convenient. (Handwaving some details here.) – Daniel Pryden Jun 20 '18 at 19:09
  • 5
    Processes on either side of a pipe can behave effectively like co-routines: one side writes until it fills up the buffer and the write blocks, at which point the process can't do anything with the rest of its timeslice and it goes into an IO wait mode. Then the OS gives the remainder of the timeslice (or another upcoming timeslice) to the reading side, which reads until there's nothing left in the buffer and the next read blocks, at which point the reader process can't do anything with the rest of its timeslice and yields back to the OS. Data goes through the pipe one buffer's worth at a time. – Daniel Pryden Jun 20 '18 at 19:13
  • @DanielPryden Okay, I get time slicing and concurrency. But mtraceur says "A program had to be able to fit in-memory to run, but only the currently running program.", where you make explicit that the process resides in memory until it's given a timeslice. This is a direct contradiction. The only solution in a limited system seems to be writing to disk, which JdeBP makes clear in his answer. Effectively, every program in a Unix pipeline is loaded into memory (or disk tempfiles) during execution. – mas Jun 20 '18 at 19:25
  • This seems to reduce the concept of pipeline from 'Reduces amount of memory allocated to a single program' as elucidated by @mtraceur to 'At least it modularizes the programs so you can pipe as many as you can fit into memory without worrying about some ridiculously monolithic program fitting into memory for a simple sed 'substitute' | sort | uniq operation.' But the brilliance of the pipeline always praised seems to always be a variation of what mtraceur says. – mas Jun 20 '18 at 19:28
  • Sorry, @mtraceur addresses exactly this: 'Every other program was swapped out to disk in early Unix, with only one program even swapped into actual RAM at a time. So the CPU would run one program, which would write to a pipe (which back then was on disk), swap out that program and swap in the one that read from the pipe.' – mas Jun 20 '18 at 19:36
  • @StéphaneChazelas: POSIX specifically requires that sort be able to sort files larger than would fit in memory, e.g. via temporary files. – R.. GitHub STOP HELPING ICE Jun 20 '18 at 21:08
  • 6
    @malan The programs are started "at the same time" conceptually on all Unix systems, just on modern multiprocessor systems with enough RAM to hold them that means they're literally all held in RAM at the same time, while on a system that can't hold them all in RAM at the same time, some get swapped out to disk. Also note that "memory" in a lot of contexts means virtual memory which is the sum of both RAM space and swap space on disk. Wikipedia is focusing on the concept rather than on the implementation details, especially because how really old Unix did things is less relevant now. – mtraceur Jun 20 '18 at 21:28
  • 2
    @malan Also, the contradiction you are seeing comes from the two different meanings of "memory" (RAM vs RAM+swap). I was talking about hardware RAM only, and in that context only the code currently being executed by the CPU needs to fit in RAM (which was what was effecting the decisions Kernighan is talking about), while in the context of all programs being logically executed by the OS at a given time (at the abstract level provided on top of time slicing) a program just needs to fit within the entire virtual memory available to the OS, which includes swap space on disk. – mtraceur Jun 20 '18 at 21:40
  • 2
    @mtraceur and you don’t even need swap for pipes & co. to help reduce memory usage since the executables themselves are their own backing store (perhaps not on early Unix though). – Stephen Kitt Jun 20 '18 at 22:06
  • @StephenKitt Great point, thanks! I got so focused on trying to address the conceptual issues that I totally forgot that particular additional implementation detail. – mtraceur Jun 20 '18 at 22:21
  • I touched on segment swapping, the precursor to demand paging, in https://unix.stackexchange.com/questions/293149/ ; but I suspect that we need a more direct question and answer. – JdeBP Jun 21 '18 at 00:19
  • 1
    @DanielPryden you underestimate those old computers. Just for example - IBM mainframes had two Fortran compilers - one used 64KB RAM. And it was the huge, optimizing one. Smaller one takes just 16 KB. Minicomputers of 1970s usually had ~100 KB RAM, shared between 5-10 users. These minis run multiple programs concurrently using this amount of RAM, although these programs were pretty simple from our view (such as ed instead of later vim) and minis extensively used swapping to HDD in order to allow multiple programs run concurrently. Buffer size between programs was probably just 512 bytes. – Bulat Jun 21 '18 at 21:56
  • Just f.e. - HP 3000 wiki article points to 1975 journal where you can find the following ad: https://books.google.ru/books?id=TgGty5HMPj0C&lpg=RA1-PA6&pg=RA1-PA11#v=onepage&q&f=true - they sell DataPoint 5500 with 64K RAM and up to 16 terminals supported. You can find a lot of interesting info about oldies in these journals. – Bulat Jun 21 '18 at 22:10
  • 1
    sort is no good example, as it always need for the input file descriptor to be closed before it can start working, so it needs to buffer the full input. If you want so, sort is not well suited for pipes at all and only works so well because common inputs are small. – allo Jun 22 '18 at 08:02
  • When I used FORTRAN IV, there was an overlay statement which allowed you to load one of several sets of subroutines, etc. which would be loaded into memory and then executed. When you were done with that one, you could load another set. This was because virtual memory wasn't a thing yet. So, they really do mean that only the part of a program executing now needs to be in primary memory. – Joe Jun 23 '18 at 07:09
  • Can I stream (pass data through pipe) larger than hard disk? – Sergey Bushmanov Feb 02 '22 at 17:43
  • @Sergey yes, try running yes | pv > /dev/null — if you leave it running it will process an ever-growing amount of data, bounded only by the time you’re willing to give it. – Stephen Kitt Feb 02 '22 at 18:43
36

But I don't understand how this could limit memory usage considering the fact that the data has to be stored in RAM to transmit between programs.

This is your fundamental error. Early versions of Unix didn't hold pipe data in RAM. They stored them on disc. Pipes had i-nodes; on a disc device that was denoted the pipe device. The system administrator ran a program named /etc/config to specify (amongst other things) which volume on which disc was the pipe device, which volume was the root device, and which the dump device.

The amount of pending data was constrained by the fact that only the direct blocks of the i-node on disc were used for storage. This mechanism made the code simpler, because much the same algorithm was employed for reading from a pipe as was employed for reading for a regular file, with some tweaks caused by the fact that pipes are non-seekable and the buffer is circular.

This mechanism was replaced by others in the middle to late 1980s. SCO XENIX gained the "High Performance Pipe System", which replaced i-nodes with in-core buffers. 4BSD made unnamed pipes into socketpairs. AT&T reimplemented pipes using the STREAMS mechanism.

And of course the sort program performed a limited internal sort of 32KiB chunks of input (or whatever smaller amount of memory it could allocate if 32KiB was not available), writing the sorted results to intermediate stmX?? files in /usr/tmp/ which it then externally merge sorted to provide the final output.

Further reading

  • Steve D. Pate (1996). "Inter-process communication". UNIX Internals: A Practical Approach. Addison-Wesley. ISBN 9780201877212.
  • Maurice J. Bach (1987). "System Calls for the File System". The Design of The Unix Operating System. Prentice-Hall. ISBN 0132017571.
  • Steven V. Earhart (1986). "config(1M)". Unix Programmer's Manual: 3. System Administration Facilities. Holt, Rinehart, and Winston. ISBN 0030093139. pp. 23–28.
  • Abhijit Menon-Sen (2020-03-23). How are Unix pipes implemented?. toroid.org.
JdeBP
  • 68,745
1

You are partially correct, but only by accident.

In your example, all data must indeed have been read "in between" the pipes, but it needs not be resident in memory (including virtual memory). Usual implementations of sort can sort datasets that won't fit in RAM by doing partials sorts to tempfiles, and merging. However, it is a given fact that you cannot possibly output a sorted sequence before having read each and every element. That's pretty obvious. So yes, sort can only start outputting to the second pipe after having read (and done whatever, possibly partially sorting tempfiles) everything from the first. But it doesn't necessarily have to keep it all in RAM.

However, this has nothing to do with how pipes work. Pipes can be named (traditionally they were all named) which means nothing more and nothing less than they have a location in the file system, like files. And that's just what pipes were once upon a time, files (with writes coalesced as much as physical memory availability would allow, as an optimization).

Nowadays, pipes are a small, finite-size kernel buffer to which data is copied, at least that's what conceptually happens. If the kernel can help it, copies are elided by playing VM tricks (for example, piping from a file usually just makes the same page available for the other process to read, so it's finally only a read operation, not two copies, and no additional memory than already used by the buffer cache anyway is needed. In some situations you may get 100% zero-copy, too. Or, something very close.

If pipes are small and finite-size, then how can this work for any unknown (possibly large) amount of data? That's simple: When nothing more fits, the write blocks until there's room again.

The philosophy of many simple programs was most useful once upon a time when memory was very scarce. Because, well, you could do work in small steps, one at a time. Nowadays, the advantages are, apart from some extra flexibility, I daresay, not that great any more.
However, pipes are implemented very efficiently (they had to be!), so there is no disadvantage either, and it's an established thing which is working fine and which people are used to, so there is no need to change the paradigm.

Damon
  • 1,685
  • When you say 'pipes were named' (JdeBP seems to say there was one 'pipe device'), does that mean there was a limit to the number of pipes that could be used at a given time (i.e., there was a limit to how many times you could use | in a command)? – mas Jun 22 '18 at 13:14
  • 2
    I have never seen such a limit, and I do not think that in theory there ever was one. In practice, anything that has a filename needs an inode, and the number of inodes is, of course, finite. As are the number of physical pages on a system, if nothing else. Modern systems guarantee 4k atomic writes, so each pipe must at least own one complete 4k page, which puts a hard limit on the number of pipes you can have. But consider having a couple of gigabytes of RAM... practically, that's a limit you will never encounter. Try and type a few million pipes on a terminal... :) – Damon Jun 22 '18 at 14:01