22

One program created lots of nested sub-folders. I tried to use command rm -fr * to remove them all. But it's very slow. I'm wondering is there any faster way to delete them all?

Anthon
  • 79,293
Lei Hao
  • 331
  • The fastest method is described here. perl is by far the fastest for files. Then rm to get the empty directories. https://unix.stackexchange.com/questions/37329/efficiently-delete-large-directory-containing-thousands-of-files – SDsolar Aug 17 '17 at 10:11

4 Answers4

26

The fastest way to remove them from that directory is to move them out of there, after that just remove them in the background:

mkdir ../.tmp_to_remove
mv -- * ../.tmp_to_remove
rm -rf ../.tmp_to_remove &

This assumes that your current directory is not the toplevel of some mounted partition (i.e. that ../.tmp_to_remove is on the same filesystem).

The -- after mv (as edited in by Stéphane) is necessary if you have any file/directory names starting with a -.

The above removes the files from your current directory in a fraction of a second, as it doesn't have to recursively handle the subdirectories. The actual removal of the tree from the filesystem takes longer, but since it is out of the way, its actual efficiency shouldn't matter that much.

Anthon
  • 79,293
  • @ndemou, never say .* it matches .. bad things can happen. (but in this case only an error message.) – Jasen Nov 06 '17 at 09:19
  • The 2nd command won't match any .dot files. To really delete all files including .dot files use this command: mv -- * .??* .[!.] ../.tmp_to_remove. Note that * matches all non .dot file names, .??* matches all dot files with a file name of 3 or more characters and .[!.] matches all dot files with a file name of exactly 2 characters except .. -- Thanks to @Jasen for leading me to this path (the result is not pretty but is correct). – ndemou Nov 06 '17 at 14:47
  • This won't help if you are out-of-space and trying to free up some space by deleting a folder which has significant number of files. The disk and inode usage would remain same in this approach. – WaughWaugh Apr 03 '19 at 09:20
  • @SayanBose Are you claiming that the third command (rm -rf ...) will not work when you are out of space? In my experience that is untrue: even if the disk is at 100% the above steps allow you to clean out a directory almost immediately,(getting free space on your disk of course requires more time). – Anthon Apr 03 '19 at 10:07
  • @Anthon, No, rm -rf will work when you are out of disk space. Your mkdir won't work if you are out of disk space or inodes. – WaughWaugh Apr 03 '19 at 10:47
  • “in the background” is thus not actually fast and still requires more computational power than more efficient solutions. The OP already said rm -rf is slow, and your answer does not change that. – MS Berends Dec 30 '21 at 08:29
  • @MSBerends The OP asks "What's the fastest way to remove all files & subfolders in a directory", so you shouldn't count the time it takes to clean up after achieving the goal, or convince the OP of accepting a different answer. – Anthon Dec 30 '21 at 12:23
  • I kindly disagree :) To me "having dinner with the family tonight" includes preparing the meal and cleaning the kitchen afterwards, but that might be personal. I would indeed suggest the OP to accept the answer of Rahul, since using rsync is faster in this case (and an ingenious solution btw) and thus the better answer IMHO. – MS Berends Dec 30 '21 at 19:13
25

rsync is surprisingly fast and simple. You have to create empty directory first,

mkdir emptydir
rsync -a --delete emptydir/ yourdirectory/

yourdirectory/ is the directory from where you want to remove the files.

Rahul
  • 13,589
  • 3
    Interesting use of rsync. Is it faster than rm? – pfnuesel Apr 18 '16 at 09:17
  • 2
    @pfnuesel : Yes, see this answer http://serverfault.com/a/328305/105902. – Rahul Apr 18 '16 at 09:29
  • 1
    I've had to copy thousands of files from one drive to another. Using cp it crashed the server, eating up all memory. Rsync did the trick without a problem - although I kept htop open in a separate session to kill it when needed. So rsync can be a very useful tool. – SPRBRN Apr 18 '16 at 09:56
10

The fastest is with rm -rf dirname. I used a snapshotted mountpoint of an ext3 filesystem on RedHat6.4 with 140520 files and 9699 directories. If rm -rf * is slow, it might be because your top-level directory entry has lots of files, and the shell is busy expanding *, which requires an additional readdir and sort. Go up a directory and do rm -rf dirname/.

Method                    Real time    Sys time  Variance (+/-)
find dir -delete          0m8.108s     0m3.668s  0.055s
rm -rf dir                0m7.956s     0m3.640s  0.081s
rsync -delete empty/ dir/ 0m8.305s     0m3.918s  0.029s

Notes:

  • rsync version : 3.0.6
  • rm/coreutils version: 8.4-19
  • find/findutils version: 4.4.2-6
Otheus
  • 6,138
  • 1
    Confirmed that. I have 1 million files to delete, spread over 3000 directories that contains thousands of subdirectories inside. Using the find method I was able to delete one directory per minute. Using the rm -rf dirname I am able to delete 1 directories every 2 seconds. I am using this bash command: for d in */;do rm -rf $d;done. thanks. – Duck Jun 29 '18 at 11:15
9

One problem with rm -rf *, or its more correct equivalent rm -rf -- * is that the shell has first to list all the (non-hidden) files in the current directory, sort them and pass them to rm, which if the list of files in the current directory is big is going to add some unnecessary extra overhead, and could even fail if the list of file is too big.

Normally, you'd do rm -rf . instead (which would also have the benefit of deleting hidden files as well). But most rm implementations including all POSIX conformant ones will refuse to do that. The reason is that some shells (including all POSIX ones) have that misfeature that the expansion of .* glob would include . and ... Which would mean that rm -rf .* would delete the current and parent directory, so rm has been modified to work around that misfeature of those shells.

Some shells like pdksh (and other Forsyth shell derivatives), zsh or fish don't have that misfeature¹. zsh has a rm builtin which you can enable with autoload zsh/files that, since zsh's .* doesn't include . nor .. works OK with rm -rf .. So in zsh, you can do:

zmodload zsh/files
rm -rf .

On Linux, you can do:

rm -rf /proc/self/cwd/

to empty the current directory or:

rm -rf /dev/fd/3/ 3< some/dir

to empty an arbitrary directory.

(note the trailing /)

On GNU systems, you can do:

find . -delete

Now, if the current directory only has a few entries and the bulk of the files are in subdirs, that won't make a significant difference and rm -rf -- * will probably be the fastest you can get. It's expected for rm -rf (or anything that removes every file) to be expensive as it means reading the content of all directories and calling unlink() on every entry. unlink() itself can be quite expensive as it involves modifying the deleted file's inode, the directory containing the file, and some file system map or other of what areas are free.

rm and find (at least the GNU implementations) already sort the list of files by inode number in each directory which can make a huge difference in terms of performance on ext4 file systems as it reduces the number of changes to the underlying block devices when consecutive (or close to each other) inodes are modified in sequence.

rsync sorts the files by name which could drastically reduce performance unless the by-name order happens to match the by-inum order (like when the files have been created from a sorted list of file names).

One reason why rsync may be faster in some cases is that it doesn't appear to take safety precautions to avoid race conditions that could cause it to descend into the wrong directory if a directory was replaced with a symlink while it's working like rm or find do.

To optimize a bit further:

If you know the maximum depth of your directory tree, you can pass it to find:

find . -maxdepth 3 -delete

That saves find having to try and read the content of the directories at depth 3.


¹ see also the globskipdots option in bash 5.2+

  • "if the list of file is too big." – fduff Apr 18 '16 at 09:44
  • In the last paragraph you talk about rm -rf being an expensive operation as it calls unlink() on all entry, but is that not what find . -delete would do too? – fduff Apr 18 '16 at 09:46
  • @fduff, yes. Like I say (maybe not clearly), find -delete won't make much difference if there are few files in the current directory, The only difference would be about avoiding creating and sorting and pass around that big list. – Stéphane Chazelas Apr 18 '16 at 09:56