Brian Kernighan explains in this video the early Bell Labs attraction to small languages/programs being based on memory limitations
A big machine would be 64 k-bytes--K, not M or G--and so that meant any individual program could not be very big, and so there was a natural tendency to write small programs, and then the pipe mechanism, basically input output redirection, made it possible to link one program to another.
But I don't understand how this could limit memory usage considering the fact that the data has to be stored in RAM to transmit between programs.
From Wikipedia:
In most Unix-like systems, all processes of a pipeline are started at the same time [emphasis mine], with their streams appropriately connected, and managed by the scheduler together with all other processes running on the machine. An important aspect of this, setting Unix pipes apart from other pipe implementations, is the concept of buffering: for example a sending program may produce 5000 bytes per second, and a receiving program may only be able to accept 100 bytes per second, but no data is lost. Instead, the output of the sending program is held in the buffer. When the receiving program is ready to read data, then next program in the pipeline reads from the buffer. In Linux, the size of the buffer is 65536 bytes (64KB). An open source third-party filter called bfr is available to provide larger buffers if required.
This confuses me even more, as this completely defeats the purpose of small programs (though they would be modular up to a certain scale).
The only thing I can think of as a solution to my first question (the memory limitations being problematic dependent upon the size data) would be that large data sets simply weren't computed back then and the real problem pipelines were meant to solve was the amount of memory required by the programs themselves. But given the bolded text in the Wikipedia quote, even this confuses me: as one program is not implemented at a time.
All this would make a great deal of sense if temp files were used, but it's my understanding that pipes do not write to disk (unless swap is used).
Example:
sed 'simplesubstitution' file | sort | uniq > file2
It's clear to me that sed
is reading in the file and spitting it out on a line by line basis. But sort
, as BK states in the linked video, is a full stop, so the all of the data has to be read into memory (or does it?), then it's passed on to uniq
, which (to my mind) would be a one-line-at-a-time program. But between the first and second pipe, all the data has to be in memory, no?
unless swap is used
swap is always used when there is not enough RAM – edc65 Jun 20 '18 at 19:29