15

I have a filesystem with many small files that I erase regularly (the files are a cache that can easily be regenerated). It's much faster to simply create a new filesystem rather than run rm -rf or rsync to delete all the files (i.e. Efficiently delete large directory containing thousands of files).

The only issue with creating a new filesystem to wipe the filesystem is that its UUID changes, leading to changes in e.g. /etc/fstab.

Is there a way to simply "unlink" a directory from e.g. an ext4 filesystem, or completely clear its list of inodes?

davidvandebunte
  • 443
  • 5
  • 9
  • 1
    have you considered switching to btrfs and using btrfs subvolume create + btrfs subvolume delete instead of deleting+creating ext4 partitions? btrfs subvolume is also an efficient way to delete huge amount of files – hanshenrik Jan 30 '22 at 00:45
  • 1
    btrfs subvolumes are specified in fstab by their name, like UUID=0b56138b-6124-4ec4-a7a3-7c503516a65c /data1 btrfs subvol=data1 where the uuid is the global btrfs uuid, separate from the subvolume being mounted, hence you'd never need to touch fstab when deleting/creating suvolumes with the same name – hanshenrik Jan 30 '22 at 00:55

6 Answers6

22

Since you're using ext4 you could format the filesystem and the set the UUID to a known value afterwards.

man tune2fs writes,

-U UUID Set the universally unique identifier (UUID) of the filesystem to UUID. The format of the UUID is a series of hex digits separated by hyphens, like this c1b9d5a2-f162-11cf-9ece-0020afc76f16.

And similarly, man mkfs.ext4 writes,

-U UUID Set the universally unique identifier (UUID) of the filesystem to UUID. […as above…]

Personally, I prefer to reference filesystems by label. For example in the /etc/fstab for one of my systems I have entries like this

# <file system>    <mount point>   <type>   <options>           <dump> <pass>
LABEL=root         /               ext4     errors=remount-ro   0      1
LABEL=backup       /backup         ext4     defaults            0      2

Such labels can be added with the -L flag for tune2efs and mkfs.ext4. They avoid issues with inode checksums causing rediscovery or corruption on a reformatted filesystem and they are considerably easier to identify visually. (But highly unlikely to be unique across multiple systems, so beware if swapping disks around.)

Chris Davies
  • 116,213
  • 16
  • 160
  • 287
17

You can use -U with mkfs.ext4 to specify your own UUID so when re-creating the filesystem you can simply reuse the previous UUID.

17

Reusing UUIDs on ext3/ext4 for a new file system is a bad idea.

The UUID is used in the inode checksum calculation to allow fsck to distinguish between inodes that belong to the current file system and inodes from a previous file system.

If you reformat an existing extended filesystem with the same UUID, there is a good chance that a file system check will find old data, and attempt to rescue it. Usually, this is an interactive process, so the non-interactive automatic file system check at boot will fail and you are then dropped into a rescue shell.

If your partition table is GPT, then you can use the partition UUID instead in the fstab, or if you use LVM, you also get a persistent name through that.

14

Instead of re-using the UUID for each new filesystem, which has some negative implications for e2fsck, you could change your /etc/fstab to use the filesystem label to mount ("LABEL=testfs" instead of "UUID=...."), and then specify the label at format time with "mke2fs -L testfs ...".

LustreOne
  • 1,774
  • 4
    "...which has some negative implications for e2fsck..." For example? – Heinzi Jan 29 '22 at 13:07
  • 3
    For example the UUID is used in the checksums for the metadata_csum feature, so this can incorrectly cause e2fsck to think that stale metadata blocks from a previous format of the filesystem contain valid metadata. That is the primary reason that the UUID is used in the checksum - to avoid false positives for stale blocks when reformatting the filesystem on the same block device. The risk isn't severe, since e2fsck is quite robust, but unnecessary in this case. – LustreOne Jan 30 '22 at 21:15
8

I'd simply put the file system on a LVM thin pool, and make a thin snapshot after creation. Then, I'd just roll back to that snapshot instead of recreating the file system. Storage-wise, this has the nearly no overhead.

You could also simply take an image of your file system right after creation, and save it for later dd restoration; this is especially attractive if the underlying storage is SSD, as that will use discard to mark blocks as zero/unused, so the creation of the image will be fast, the image file mostly sparse and hence trivial in effective storage need, and restoration would be blazingly fast as well.

Note that depending on the size of the files, there's actually pretty significant differences in speed between file systems (see benchmarks at bottom of page). For medium-sized files, XFS might be faster!

But there's an easier solution, if you're used to mkfs on demand: mkfs.ext4 -U ${your_uuid_here} lets you set the UUID of a file system :)

1

Another option to consider is changing your fstab to use a label or partition UUID (PARTUUID) instead of the filesystem UUID. In this way, you can reformat the partition however you like, and it will still get mounted correctly.