3

Background: We have a computer cluster and on node allocation a job gets its own tmp directory of requested size. I noticed however I could send two jobs to the same machine with total requested disk space greater than what was available. I tracked the bug down to fallocate and mkfs.ext4.

On a test node/computer with 1.1T disk space available I create virtual disks to mount tmp directories to. Using fallocate and mkfs.ext4:

# fallocate -l 900G /tmp/disk-test1
# /sbin/mkfs.ext4 -F /tmp/disk-test1
# fallocate -l 900G /tmp/disk-test2
# /sbin/mkfs.ext4 -F /tmp/disk-test2

creates two files both (seemingly) of size 900G

# ll --block-size=G /tmp/
...
-rw-r--r--. 1 root  root 900G Jul  4 14:03 disk-test1
-rw-r--r--. 1 root  root 900G Jul  4 14:03 disk-test2
...

and looking at available disk space

# df -h
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/vg.01-lv_root  1.1T  8.6G  1.1T   1% /
...

The /tmp dir:

# df -h /tmp
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/vg.01-lv_root  1.1T  8.6G  1.1T   1% /

I don't want this to happen. The virtual disks must not be created if there is not enough space left and once mounted writing to them should be limited by their size.

What is going on here?

Kisi
  • 33

1 Answers1

3

Yeah, I can reproduce that:

# df -h .
Filesystem           Size  Used Avail Use% Mounted on
/dev/root             30G   14G   14G  51% /
# fallocate -l 8G test1.disk
# df -h .
Filesystem           Size  Used Avail Use% Mounted on
/dev/root             30G   22G  5.8G  80% /
# mkfs -text4 test1.disk
mke2fs 1.43.4 (31-Jan-2017)
Discarding device blocks: done                            
Creating filesystem with 2097152 4k blocks and 524288 inodes
...
# df -h .
Filesystem           Size  Used Avail Use% Mounted on
/dev/root             30G   14G   14G  51% /

Disk usage goes up when the file is fallocated, but back down after mkfs. Note the "Discarding device blocks: done" in the output of mke2fs. (frostschutz mentioned this in the comments.)

The man page says of -E discard:

discard
Attempt to discard blocks at mkfs time (discarding blocks initially is useful on solid state devices and sparse / thin-provisioned storage). ... This is set as default.

There's nodiscard to do the opposite, so let's try that:

# df -h .; fallocate -l 8G test2.disk; mkfs -text4 -Enodiscard test2.disk; df -h .
Filesystem           Size  Used Avail Use% Mounted on
/dev/root             30G   14G   14G  51% /
mke2fs 1.43.4 (31-Jan-2017)
Creating filesystem with 2097152 4k blocks and 524288 inodes
...
Filesystem           Size  Used Avail Use% Mounted on
/dev/root             30G   22G  5.9G  79% /

Now another fallocate -l 8G fails.

ilkkachu
  • 138,973