44

I'm running Arch Linux, and use ext4 filesystems.

When I run ls in a directory that is actually small now, but used to be huge - it hangs for a while. But the next time I run it, it's almost instantaneous.

I tried doing:

strace ls

but I honestly don't know how to debug the output. I can post it if necessary, though it's more than a 100 lines long.

And, no, I'm not using any aliases.

$ type ls
ls is hashed (/usr/bin/ls)

$ df . Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda9 209460908 60427980 138323220 31% /home

Seamus
  • 2,925
Belen
  • 551
  • How many entries does the directory have? – Cyrus Apr 24 '21 at 22:46
  • Which OS do you use? – Cyrus Apr 24 '21 at 22:46
  • 2
    72 entries. I use Arch. – Belen Apr 24 '21 at 22:47
  • 1
    Is ls -f much faster? Show output of df . in this directory. – Cyrus Apr 24 '21 at 22:48
  • Which file system? ext4? – cg909 Apr 24 '21 at 22:50
  • No, ls -f is pretty much the same. Yes, ext4. Posted the output of df . in an edit. – Belen Apr 24 '21 at 22:54
  • 3
  • The answer is related to the above answer, I think. Once the directory entry is cached, ls will be fast, but it remains a large structure. – jsbillings Apr 24 '21 at 23:27
  • Yes, that seems to be the problem. Any ideas how to fix it? – Belen Apr 24 '21 at 23:47
  • strace output won't be useful; very likely the kernel returns all the currently-present entries in one getdents64 system call. (And then another one returns 0 so the readdir(3) library function detects EOF). This is very likely an issue of the EXT4 filesystem being slow to read from disk into Linux's VFS metadata cache, since as you see, once there it's basically instant. – Peter Cordes Apr 25 '21 at 13:06
  • Could you perhaps tell us what you consider to be "extremely long"? Because (unless the system was being completely thrashed by some other process) I've never encountered an ls that takes more time than displaying results to the terminal. – jamesqf Apr 25 '21 at 17:00
  • 7
    @jamesqf The time it takes to retrieve directory entries is directly proportional to the total number of entries to retrieve (though it may be quantized to some extent). It’s unusual with fast storage because most directories do not have multiple thousands of entries, but historically this was a bigger issue. Even today though it can still happen, at my last job I had to deal with a lack of a proper folder structure on a major fileserver, which resulted in a directory with more than 70000 entries, which would take about 30 seconds to run ls on despite the server having very fast storage. – Austin Hemmelgarn Apr 25 '21 at 22:20
  • @Austin Hemmelgarn: I have to admit not ever having had to deal with pathological cases like yours, but I don't think the PDP-11 (where I first encountered ls) could reasonably be said to have fast storage :-) – jamesqf Apr 26 '21 at 02:27
  • @AustinHemmelgarn: Part of ls being that slow when the files actually existed may be stating each one for aliases that include ls --color=auto or ls -F (/ for directories, * for executable permission on files, | for pipes, etc.) That requires looking at the inodes for each file in the directory, and they're not necessarily contiguous. (If everything's hot in VFS cache, though, even 70k system calls can go by pretty quickly.) – Peter Cordes Apr 26 '21 at 08:22
  • On the side note, you won't face this issue on XFS file system. (This is not a answer to the question, nor any kind of recommendation) – Alex Jones Apr 30 '21 at 07:48

2 Answers2

63

A directory that used to be huge may still have a lot of blocks allocated for directory entries (= names and inode numbers of files and sub-directories in that directory), although almost all of them are now marked as deleted.

When a new directory is created, only a minimum number of spaces are allocated for directory entries. As more and more files are added, new blocks are allocated to hold directory entries as needed. But when files are deleted, the ext4 filesystem does not consolidate the directory entries and release the now-unnecessary directory metadata blocks, as the assumption is that they might be needed again soon enough.

You might have to unmount the filesystem and run a e2fsck -C0 -f -D /dev/sda9 on it to optimize the directories, to get the extra directory metadata blocks deallocated and the existing directory entries consolidated to a smaller space.

Since it's your /home filesystem, you might be able to do it by making sure all regular user accounts are logged out, then logging in locally as root (typically on the text console). If umount /home in that situation reports that the filesystem is busy, you can use fuser -m /dev/sda9 to identify the processes blocking you from unmounting /home. If they are remnants of old user sessions, you can probably just kill them; but if they belong to services, you might want to stop those services in a controlled manner.

The other classic way to do this sort of major maintenance to /home would be to boot the system into single-user/emergency mode. On distributions using systemd, the boot option systemd.unit=emergency.target should do it.

And as others have mentioned, there is an even simpler solution, if preserving the timestamps of the directory is not important, and the problem directory is not the root directory of the filesystem it's in: create a new directory alongside the "bloated" one, move all files to the new directory, remove the old directory, and rename the new directory to have the same name as the old one did. For example, if /directory/A is the one with the problem:

mkdir /directory/B
mv /directory/A/* /directory/B/      # regular files and sub-directories
mv /directory/A/.??* /directory/B/   # hidden files/dirs too
rmdir /directory/A
mv /directory/B /directory/A

Of course, if the directory is being used by any services, it would be a good idea to stop those services first.

telcoM
  • 96,466
  • 3
    Just fair warning, any systemd service that has ProtectHome enabled will have /home in a private namespace and you won’t be able to unmount /home, and it won’t show up in fuser, because it is a kernel mount. I believe CUPS is one of those services by default. – jsbillings Apr 25 '21 at 02:02
  • 1
    This worked. Though I had to add the -f flag because it was otherwise reporting that the system was clean without actually checking it. – Belen Apr 25 '21 at 08:38
  • 3
    @Belen Thanks for the feedback, I edited the -f flag into the answer. It's been quite a while since I had to do this, so I didn't remember that detail. – telcoM Apr 25 '21 at 09:20
  • "You might have to unmount the filesystem and run a e2fsck" -- Wouldn't it suffice to make a new directory, move all the files to it, rmdir the old one, and rename the new one? – JoL Apr 25 '21 at 15:47
  • 1
    @JoL you're entirely correct. However, as others have noted, that has two limitations: the directory must not be the root directory of the filesystem it's in, and such a move will cause the directory timestamp to change, which may or may not be important. – telcoM Apr 25 '21 at 16:27
  • 1
    You can preserve the timestamps (and everything else) easily enough using rsync instead of mv. – OrangeDog Apr 26 '21 at 08:07
  • 2
    @OrangeDog - cp -a should be more efficient than rsync as no contents will need to be copied (just new directory entries in the target created). Unless there is something cp -a does not preserve that rsync does? – David Spillett Apr 26 '21 at 16:16
  • 2
    Just an important note: the correct "fix" here in nearly all conceivable instances is creating a new directory, moving all the files into it, and then deleting the old directory. It's very unlikely that you'd need to fsck. The mv(1) command preserves timestamps. By design, one should not do any file activity in the root of any mounted filesystem - always do it in a subdirectory. With the exception of /tmp, where it's considered reasonable to remove all files on a reboot. – Brian C Apr 28 '21 at 02:31
45

Out of curiosity, let's try to reproduce this:

$ mkdir test
$ cd test
$ time ls   # Check initial speed of ls
real    0m0,002s
$ stat .    # Check initial size of directory
  File: .
  Size: 4096        Blocks: 8          IO Block: 4096   directory
  ...
$ seq 1 1000000 | xargs touch    # Create lot of files
$ echo 3 | sudo tee /proc/sys/vm/drop_caches   # Clear cache
$ time ls > /dev/null
real    0m1.588s
$ stat .                        # Check size of directory when files are there
  File: .
  Size: 22925312    Blocks: 44776      IO Block: 4096   directory

Ok, so now we have a large directory. Let's remove the files and see what happens:

$ ls | xargs rm   # To avoid too long argument list
$ echo 3 | sudo tee /proc/sys/vm/drop_caches
$ time ls > /dev/null
real    0m1.242s
$ stat .
 File: .
 Size: 22925312     Blocks: 44776      IO Block: 4096   directory

So yes, the allocated size for the directory does stay large and that does cause slow ls, like telcoM's answer also indicated.

If it is just a single directory with the problem, there is a simpler solution that does not require unmounting or root access: Simply create a new directory, move remaining files to it and remove the bloated one.

jpa
  • 1,269
  • 4
    e2fsck defrag/optimize has the advantage of preserving timestamps including ctime, but yes it's certainly simpler to just mkdir and mv, unless it's the root directory of a filesystem. – Peter Cordes Apr 25 '21 at 13:10
  • 1
    I thought of that, but I wanted to preserve the name. (Creating a new directory, moving everthing there, and renaming that directory with mv didn't solve the problem.) Besides, I'm glad I asked because I learned why that problem was ocurring. – Belen Apr 25 '21 at 15:44
  • 2
    @Belen That's quite surprising, as the directory entry should be linked to the inode and not the name. – jpa Apr 25 '21 at 15:48
  • 5
    @Belen If you do mkdir B; mv A/* B/; mv B Ayou'll end up with the files in A/B/* and the A directory still bloated. The rmdir A step before the second mv is important in this case. – telcoM Apr 25 '21 at 16:30
  • Now that you mention it, you're right, I didn't do rmdir A. – Belen Apr 25 '21 at 17:09
  • Why is the "Size" output of stat . an order of magnitude higher after the files are deleted? – GoodDeeds Apr 26 '21 at 00:23
  • It may be better to do cp -al A B and follow it up by rm -r A and mv B A. This way, one preserves all timestamps of A except ctime, I think! – Kapil Apr 26 '21 at 03:45
  • I understand that it should not reduce, but why should it increase? – GoodDeeds Apr 26 '21 at 04:02
  • @GoodDeeds Sorry, it was my copy-paste goof-up. I initially had only 100k files, but thought that the time difference wasn't big enough and reran with 1000k files, but apparently missed updating that column. – jpa Apr 26 '21 at 04:03
  • I see, thanks for the clarification! – GoodDeeds Apr 26 '21 at 04:03
  • "Simply create a new directory, move remaining files to it and remove the bloated one." - I still think this is a bug. Why should I as a user care about when to move or remove a formerly bloated directory? I just need the file system to do its work – Thomas Weller Apr 27 '21 at 12:59
  • @ThomasWeller Yeah, many filesystems do shrink directories automatically, for example btrfs. – jpa Apr 27 '21 at 13:42