Using fallocate and mkfs.ext4 can exceed available disk space

Question

Background: We have a computer cluster and on node allocation a job gets its own tmp directory of requested size. I noticed however I could send two jobs to the same machine with total requested disk space greater than what was available. I tracked the bug down to fallocate and mkfs.ext4.

On a test node/computer with 1.1T disk space available I create virtual disks to mount tmp directories to. Using fallocate and mkfs.ext4:

# fallocate -l 900G /tmp/disk-test1
# /sbin/mkfs.ext4 -F /tmp/disk-test1
# fallocate -l 900G /tmp/disk-test2
# /sbin/mkfs.ext4 -F /tmp/disk-test2

creates two files both (seemingly) of size 900G

# ll --block-size=G /tmp/
...
-rw-r--r--. 1 root  root 900G Jul  4 14:03 disk-test1
-rw-r--r--. 1 root  root 900G Jul  4 14:03 disk-test2
...

and looking at available disk space

# df -h
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/vg.01-lv_root  1.1T  8.6G  1.1T   1% /
...

The /tmp dir:

# df -h /tmp
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/vg.01-lv_root  1.1T  8.6G  1.1T   1% /

I don't want this to happen. The virtual disks must not be created if there is not enough space left and once mounted writing to them should be limited by their size.

What is going on here?

Use filefrag to get info about how it is allocated. Also mkfs might discard / TRIM the file. — frostschutz, Jul 05 '18 at 10:06
And discard / TRIM also happens when mounted, if using discard mount option or fstrim. Depending on your distro fstrim might be done by a system-wide cron job or service. Since you're using LVM anyway, why not create LV of desired size rather than disk images inside LV? — frostschutz, Jul 05 '18 at 10:16
Using filefrag disk-test1 outputs "485 extents found". Regarding LV, I will try that. — Kisi, Jul 05 '18 at 10:24
Take a look at "df report incorrect free space for a filesystem (ext4)" on ServerFault. — AlexP, Jul 05 '18 at 13:16

ilkkachu · Accepted Answer · 2018-07-05T15:29:54.203

Yeah, I can reproduce that:

# df -h .
Filesystem           Size  Used Avail Use% Mounted on
/dev/root             30G   14G   14G  51% /
# fallocate -l 8G test1.disk
# df -h .
Filesystem           Size  Used Avail Use% Mounted on
/dev/root             30G   22G  5.8G  80% /
# mkfs -text4 test1.disk
mke2fs 1.43.4 (31-Jan-2017)
Discarding device blocks: done                            
Creating filesystem with 2097152 4k blocks and 524288 inodes
...
# df -h .
Filesystem           Size  Used Avail Use% Mounted on
/dev/root             30G   14G   14G  51% /

Disk usage goes up when the file is fallocated, but back down after mkfs. Note the "Discarding device blocks: done" in the output of mke2fs. (frostschutz mentioned this in the comments.)

The man page says of -E discard:

discard
Attempt to discard blocks at mkfs time (discarding blocks initially is useful on solid state devices and sparse / thin-provisioned storage). ... This is set as default.

There's nodiscard to do the opposite, so let's try that:

# df -h .; fallocate -l 8G test2.disk; mkfs -text4 -Enodiscard test2.disk; df -h .
Filesystem           Size  Used Avail Use% Mounted on
/dev/root             30G   14G   14G  51% /
mke2fs 1.43.4 (31-Jan-2017)
Creating filesystem with 2097152 4k blocks and 524288 inodes
...
Filesystem           Size  Used Avail Use% Mounted on
/dev/root             30G   22G  5.9G  79% /

Now another fallocate -l 8G fails.

This works, thank you! I didn't notice this in the man page. — Kisi, Jul 05 '18 at 14:53

Using fallocate and mkfs.ext4 can exceed available disk space

1 Answers1