6

If I have the following contents in a directory:

  • empty_dir: Empty directory.
  • empty_file: Empty file.
  • one_char: File consisting of one character.
  • several_blocks: File consisting of several blocks (but not "too" large or "sparse").

Then, ls will display the following †:

$ ls -Gghs
total 152K
8,0K drwxr-xr-x 2 4,0K dec 21 23:34 empty_dir
4,0K -rw-r--r-- 1    0 dec 21 23:21 empty_file
8,0K -rw-r--r-- 1    1 dec 21 23:22 one_char
132K -rw-r--r-- 1 127K dec 22 00:14 several_blocks

Secondly, stat displays the following:

$ stat empty_dir/
  File: empty_dir/
  Size: 4096            Blocks: 16         IO Block: 4096   directory
  ...

$ stat empty_file 
  File: empty_file
  Size: 0               Blocks: 8          IO Block: 4096   regular empty file
  ...

$ stat one_char 
  File: one_char
  Size: 1               Blocks: 16         IO Block: 4096   regular file
  ...

$ stat several_blocks 
  File: several_blocks
  Size: 129760          Blocks: 264        IO Block: 4096   regular file
  ...

Thirdly, du displays the following:

$ du -h empty_dir/
8,0K    empty_dir/

$ du -h empty_file 
4,0K    empty_file

$ du -h one_char 
8,0K    one_char

$ du -h several_blocks 
132K    several_blocks

Lastly:

$ tune2fs /dev/nvme0n1p2 -l
...
Block size:               4096
...
Inode size:               256
...

The size of the blocks reported by stat is 512 B, which means that the output between stat, ls, and du is consistent:

  • empty_dir: 16 * 512 / 1024 = 4096 + 4096 = 8 KiB.
  • empty_file: 8 * 512 / 1024 = 0 + 4096 = 4 KiB.
  • one_char: 16 * 512 / 1024 = 4096 + 4096 = 8 KiB.
  • several_blocks: 264 * 512 / 1024 = 129760 + 5408 = 129760 + 1312 + 4096 = 131072 + 4096 = 32 * 4096 + 4096 = 132 KiB.

Questions

  1. Why is the allocated size for empty_dir and one_char two blocks (of size 4096 B) and not one?
  2. Why is the allocated size for empty_file one block and not zero?
  3. Why is the allocated size for several_blocks (and larger files in general) more than one block larger than the apparent size ((264 * 512) - 129760 = 5408 > 4096)?

I suspect the additional block is the one containing the inode, like this questioner asks (but goes unanswered). Similarly this questioner has observed the double size, but it is incorrectly formulated in the question and receives an answer to the other part of the question. However, this answer to a different question, suggests that there should be no additional blocks (which was my intuition).

  1. Are our systems incorrectly configured?
  2. Assuming the block containing the inode is counted: When using du on multiple files, does it compensate for counting the inode block several times, should multiple inodes be in the same block (since one block can contain 16 inodes (4096 / 256 = 16))?

Appendix

@WumpusQ.Wumbley speculated that it could be extended attributes and this turned out to be the case!

getfattr returns user.com.dropbox.attributes. Turns out the testing directory was a subdirectory deep down in a directory that was symbolically linked into my Dropbox folder. See the accepted answer below.


This uses GNU Core Utilities 8.30 on GNU/Linux with kernel 4.19.1 (Manjaro) on ext4 on a NVME SSD.

Klorax
  • 244
  • 1
    That's several questions, but the pertinent one can be answered simply by pointing out that coreutils does not *calculate* the sizes: those report the information from the kernel using standard system calls. – Thomas Dickey Dec 22 '18 at 14:32
  • 2
    It looks like basically one good question to me: why does everything have one extra 4K block allocated? And I don't have any good ideas so here's a bad one: extended attributes? These would be stored in the inode if they're small but if they're big they get their own blocks. If they exist, you can read them with getfattr –  Dec 22 '18 at 17:34

2 Answers2

4

@WumpusQ.Wumbley pointed out the cause in a comment: extended attributes.

For completeness sake the answers are presented below.

Extended attributes, in this case applied by Dropbox (getfattr returns user.com.dropbox.attributes), uses additional blocks for storage. Without these extended attributes ls (and the other commands) returns:

$ ls -Gghs
total 136K
4,0K drwxr-xr-x 2 4,0K dec 22 20:11 empty_dir
   0 -rw-r--r-- 1    0 dec 22 20:11 empty_file
4,0K -rw-r--r-- 1    1 dec 22 20:12 one_char
128K -rw-r--r-- 1 127K dec 22 20:13 several_blocks

As expected.

In addition, stat for the only interesting case of several_blocks returns:

$ stat several_blocks 
  File: several_blocks
  Size: 129760          Blocks: 256        IO Block: 4096   regular file
  ...

Which is also as expected, since 256 * 512 - 129760 = 1312 < 4096, i.e., no extra block used.

  1. Due to extended attributes.
  2. Due to extended attributes.
  3. Due to extended attributes.
  4. No, but be aware of extended attributes added by applications.
  5. Incorrect assumption.
Klorax
  • 244
3

The "additional blocks" are not due to some inconsistency in configuration. (Hypothetically, it could always be wrong for some other reason though. Like cosmic rays that corrupted your kernel code :-)).

I say this because there is no option to manually tweak details of the calculation of the disk usage for these commands. The commands only convert the disk usage to different units, by multiplying or dividing. The disk usage is obtained by calling the stat() system call. The kernel returns a number of synthetic "blocks", which are always 512 bytes. Nor is there any kernel option that affects how stat() calculates the number of blocks.

I can tell you the block which contains the inode is not supposed to be counted on your ext4 filesystem. In general, Giles says it is not counted on any filesystem that he is aware of. Perhaps in part due to the point you raise :-). Inodes tend to be smaller than the 512-byte blocks reported by stat. ext4 defaults to 256-byte inodes; ext3 defaulted to 128 bytes.

If we look through the related questions (right sidebar), we notice one case where there can be additional blocks. The extent tree (or indirect blocks, if extents are disabled) is counted on ext4. (Why is the difference in file size and it's size on disk bigger than 4 KiB?)

A second answer to the linked question suggests another case. Some uses of fallocate() might allow creating files with an arbitrarily large difference between their size, and the number of blocks allocated to them.

That said, I suspect the above is not sufficient to explain any of your examples.

sourcejedi
  • 50,249
  • 1
    The fallocate hypothesis is interesting, but it doesn't appear to work on directories. –  Dec 22 '18 at 16:43