9

From reading this, it seems that when copying data to a different hard drive, cat automatically uses the optimum block size (or very near it).

I wonder how it determines the optimum block size, and whether the method cat uses can be applied to dd somehow.

EmmaV
  • 4,067

1 Answers1

10

The main loop of GNU cat, in the simplest case is (function simple_cat from cat.c):

while (true)
    {
        /* Read a block of input. */
        n_read = safe_read (input_desc, buf, bufsize);

        /* ... */
    }

Then the question becomes "how is bufsize set?" The answer is it's using io_blksize (insize = io_blksize (stat_buf)), which is defined as follows:

io_blksize (struct stat sb)
{
  return MAX (IO_BUFSIZE, ST_BLKSIZE (sb));
}

where ST_BLKSIZE gives the operating system's idea of the file system's preferred I/O block size (as accessed using stat), and IO_BUFSIZE is defined as 128*1024 (128KB). Here is an excerpt of the Linux stat syscall documentation:

blksize_t st_blksize; /* blocksize for file system I/O */ (...)

The st_blksize field gives the "preferred" blocksize for efficient
file system I/O.   (Writing to a file in smaller  chunks may cause
an inefficient read-modify-rewrite.)

So it seems that GNU cat will read in blocks of 128KB or the file system's recommended I/O block size, whichever is larger.

dhag
  • 15,736
  • 4
  • 55
  • 65
  • For a deeper understanding, perhaps you should mention where ST_BLKSIZE is coming from — to help OP understand that this is no "magic" but a file property that can be queried through stat(2). – Andreas Wiese Nov 25 '15 at 21:05
  • Good point, Andreas. I updated my answer to say a little more about st_blksize. – dhag Nov 25 '15 at 21:46
  • So I would use stat -f -c %s file to find the block size of the file system where the input file is? Doesn't the block size of the file system where the output file is going matter? And what about copying e.g. /dev/zero to a raw block device? – EmmaV Nov 25 '15 at 23:37
  • 1
    Yes, stat -f -c %s file allows you to access that value from the command line. The output file system block size is taken into account, I had elided that part but will expand my answer (the largest of the output size, input size, and hard-coded value is used). I don't see that pseudo-files such as /dev/zero are handled in a special way: GNU cat will also simply use stat on those. – dhag Nov 26 '15 at 15:23