How does Bash pipe large amounts of data?

Question

Let's say you want to cat the contents of a really big file, but want to view it a few bits at a time. Let's say one were to do the following:

$ cat /dev/sda1 | less

As a programmer of languages such as Java and ActionScript, when I look at that code I imagine Bash first running the command cat /dev/sda1 (loading everything the command returns into RAM), and then running the command less which has access to that really big "pseudo-variable" represented as -.

Is that the way Bash does things (meaning that command is a really bad idea if the file is larger than the amount of RAM on your system, and you should use another command), or does it have a way of optimising the piping of large amounts of data?

An alternative title to this question could be "Does Bash pipe large amounts of data effectively?" — IQAndreas, Oct 31 '14 at 04:21
At the risk of splitting hairs: bash doesn’t actually implement pipes. When you type cat /dev/sda1 | less, bash creates the pipe between the two processes, but after that it is implemented by the OS kernel. The cat and tail commands run in parallel; see In what order do piped commands run? And, in case this isn’t clear from Karthik’s answer: if the cat gets ahead of the tail (i.e., the cat writes to the pipe faster than the tail reads from it), the operating system will force the cat to pause until the tail catches up with it. — Scott - Слава Україні, Oct 31 '14 at 19:42
(There’s a pun in there somewhere, about a cat chasing its tail, or vice versa.) — Scott - Слава Україні, Oct 31 '14 at 19:43
Does this answer your question? In what order do piped commands run? — G-Man Says 'Reinstate Monica', May 11 '20 at 03:14

score 8 · Accepted Answer · edited Apr 13 '17 at 12:36

8

No it doesn't load everything into memory, that would be an impractical way to design this. It uses buffers to buffer the output from the left side of the pipe, and then connect these buffers to the input of the command on the right side of the pipe.

The man page man 7 pipe has all the details, as well as this other U&L Q&A titled: How big is the pipe buffer?

edited Apr 13 '17 at 12:36

Community

1

answered Oct 31 '14 at 04:59

slm

369,824

score 2 · Answer 2 · answered Oct 31 '14 at 12:24

read will block until data is available, and write will block or fail incase the pipe is full. There few parameters such as PIPE_BUF , PIPE_SIZE and O_NONBLOCK that play a key role in pipe.

The value of PIPE_BUF can be determined via 'ulimit -a' . It is defined in limits.h. The PIPE_BUF controls the guaranteed size for atomic write. This helps in making safe multithreaded apps.

The PIPE_SIZE depends on the page size. In 2.4 kernel, it was equivalent to size of one page(4KB). However versions after 2.6 are mapped to an array of 16 pages(64KB). This is defined in the file pipe_fs_i.h as PIPE_BUFFERS (16). Later versions of kernel have fcntl with F_SETPIPE_SZ enable increase in page size.

The O_NONBLOCK enables to perform partial and deferred writes. However, if O_NONBLOCK is enabled but if the number of bytes to be written in pipe is greater than PIPE_BUF, then write will fail incase the pipe if full, else based on the return value of write, it will be interleaved with data from other processes.

When that data gets interleaved, does that mean the reader cannot differentiate which bytes come from which writer? — CMCDragonkai, May 01 '15 at 03:40

score 0 · Answer 3 · edited Dec 12 '16 at 14:27

Try to use option -B, it uses only 64k buffer.

cat /dev/sda1 | less -B

From man less:

-B or --auto-buffers By default, when data is read from a pipe, buffers are allocated automatically as needed. If a large amount of data is read from the pipe, this can cause a large amount of memory to be allocated. The -B option disables this automatic allocation of buffers for pipes, so that only 64K (or the amount of space specified by the -b option) is used for the pipe. Warning: use of -B can result in erroneous display, since only the most recently viewed part of the file is kept in memory; any earlier data is lost.

How does Bash pipe large amounts of data?

3 Answers3