283

We have an issue with a folder becoming unwieldy with hundreds of thousands of tiny files.

There are so many files that performing rm -rf returns an error and instead what we need to do is something like:

find /path/to/folder -name "filenamestart*" -type f -exec rm -f {} \;

This works but is very slow and constantly fails from running out of memory.

Is there a better way to do this? Ideally I would like to remove the entire directory without caring about the contents inside it.

Wildcard
  • 36,499
Toby
  • 3,993
  • 31
    rm -rf * in the folder probably fails because of too many arguments; but what about rm -rf folder/ if you want to remove the entire directory anyways? – sr_ Apr 26 '12 at 08:01
  • From memory that is what I was doing, I think because it recurses in to build out the list of files to delete before it deletes them? – Toby Apr 26 '12 at 08:09
  • 12
    Just out of curiosity - how many files does it take to break rm -rf? – jw013 Apr 26 '12 at 11:37
  • 10
    You should probably rename the question to something more accurate, like "Efficiently delete large directory containing thousands of files." In order to delete a directory and its contents, recursion is necessary by definition. You could manually unlink just the directory inode itself (probably requires root privileges), unmount the file system, and run fsck on it to reclaim the unused disk blocks, but that approach seems risky and may not be any faster. In addition, the file system check might involve recursively traversing the file system tree anyways. – jw013 Apr 26 '12 at 13:27
  • 8
    Once I had a ccache file tree so huge, and rm was taking so long (and making the entire system sluggish), it was considerably faster to copy all other files off the filesystem, format, and copy them back. Ever since then I give such massive small file trees their own dedicated filesystem, so you can mkfs directly instead of rm. – frostschutz Jun 15 '13 at 11:43
  • 1
    @jw013 see this question on SO -- it varies from system to system (and it's a bash limitation rather than an rm limitation), you can find out what your limit is with echo "$(getconf ARG_MAX)/4-1" | bc (mine comes to 524287 arguments, which I've tested and found to be correct). – evilsoup Jun 27 '13 at 23:01
  • 4
    I find it implausible that find would fail due to running out of memory, since it executes rm immediately for each matching file, rather than building up a list. (Even if your command ended with + rather than \;, it would run rm in reasonably sized batches.) You would have to have a ridiculously deep directory structure to exhaust memory; the breadth shouldn't matter much. – 200_success Aug 31 '13 at 06:07
  • 7
    Instead of deleting it manually, I suggest having the folder on a separate partition and simply unmount && format && remount. – bbaja42 Apr 26 '12 at 11:22
  • 1
    The reason it is always quite slow with millions of files is that the filesystem must update its directory metadata and linked lists after each file is removed. It would be much faster if you could tell the filesystem that you don't need the entire directory, so it would throw out entire metadata at once. – Marki555 Jun 26 '15 at 21:46
  • Use the perl script in one of the answers, then rm to get the rest of it. WAY fast. – SDsolar Aug 17 '17 at 09:55
  • Note that at some point you're going to run into the physical limit of disk speed. Both rsync -a --delete and find ... -type f --delete run at the same speed for me on an old RHEL 5.10 system for that reason. – RonJohn Mar 03 '18 at 19:13
  • If you don't want to wait, or you just need to get rid of the folder fast, mv is always faster than anything else. Just mv folder_to_be_deleted /tmp/trash then reboot. Files in the /tmp directory will be deleted upon your next reboot. – JB Juliano Nov 29 '22 at 14:35

24 Answers24

385

Using rsync is surprising fast and simple.

mkdir empty_dir
rsync -a --delete empty_dir/    yourdirectory/

@sarath's answer mentioned another fast choice: Perl!  Its benchmarks are faster than rsync -a --delete.

cd yourdirectory
perl -e 'for(<*>){((stat)[9]<(unlink))}'

or, without the stat (it's debatable whether it is needed; some say that may be faster with it, and others say it's faster without it):

cd yourdirectory
perl -e 'for(<*>){unlink}'

Sources:

  1. https://stackoverflow.com/questions/1795370/unix-fast-remove-directory-for-cleaning-up-daily-builds
  2. http://www.slashroot.in/which-is-the-fastest-method-to-delete-files-in-linux
  3. https://www.quora.com/Linux-why-stat+unlink-can-be-faster-than-a-single-unlink/answer/Kent-Fredric?srid=O9EW&share=1
  • 4
    Thanks, very useful. I use rsync all the time, I had no idea you could use it to delete like this. Vastly quicker than rm -rf – John Powell Aug 21 '14 at 19:41
  • I ran exact same command, but i don't see the effect of delete. There is no error. Return status of rsync command is 0. Has anyone see such no-effect behaviour? – mtk May 15 '15 at 09:38
  • 34
    rsync can be faster than plain rm, because it guarantees the deletes in correct order, so less btress recomputation is needed. See this answer http://serverfault.com/a/328305/105902 – Marki555 Jun 29 '15 at 12:45
  • 20
    Can anyone modify the perl expression to recursively delete all directories and files inside a directory_to_be_deleted ? – Abhinav Oct 06 '15 at 15:43
  • 17
    Notes : add -P option to rsync for some more display, also, be careful about the syntax, the trailing slashes are mandatory. Finally, you can start the rsync command a first time with the -n option first to launch a dry run. – Drasill Oct 23 '15 at 15:39
  • If you're looking at iotop while removing the files, you might find this interesting: iotop showing 1.5 MB/s of disk write, but all programs have 0.00 B/s – Franck Dernoncourt Dec 09 '15 at 02:26
  • 2
    -a equals -rlptgoD, but for deletion only -rd is necessary – Koen. Mar 19 '16 at 14:36
  • 1
    @mtk use the -P option to see what's going on, it does seem to do nothing for a while , when it builds the file list. – Joel Davey Dec 09 '16 at 09:22
  • 2
    Warning: Attached Perl code is possibly suboptimal, as some of the operations used don't have any reasonable justification. The cited article doesn't know why either, and the "stat" call demonstrably slows things down a small amount under testing: https://www.quora.com/Linux-why-stat+unlink-can-be-faster-than-a-single-unlink/answer/Kent-Fredric?srid=O9EW&share=1 – Kent Fredric Feb 09 '17 at 16:23
  • Yay for Sarah. That perl one is very surprising how fast it is. I then clean it up with a final rm -rf <target> to get all the directories. PLUS, I then have to check the trash from previous efforts. Thank you for this. Way beyond what I could have written. I do not understand the [9] part. And I had to catch that you used real apostrophes instead of reverse ticks. And it works. Thank you from August 2017 - Ubuntu 16.04 LTS – SDsolar Aug 17 '17 at 09:52
  • This is especially helpful when you are deleting an entire kernel source code directory. With rm -rf*, it would take more than 10 minutes (I actually never reached the end), but with the rsync example, it was a few seconds or total time for an entire Linux kernel source code directory. Good job on the answer. – Terry Oct 06 '17 at 19:31
  • You can even shorden it to perl -e 'for(</path/to/your/dir/*>){((stat)[9]<(unlink))}' – EvgenyKolyakov Oct 07 '19 at 16:58
  • 7
    That perl command don't work – codenamezero Nov 27 '19 at 17:40
  • I started a tar command last night and it is still stuck on a PHP sessions folder. It is better to remove sessions folder. But, I am using OpenVZ where rm -rf command takes lot of CPU and it gets terminated because of abuse. Using rsync is light weight and does not go above 1% CPU usage. iUsed nodes from df -i shows 24710900. – Rehmat Dec 23 '20 at 03:48
  • stat() call in the Perl one-liner seems to be useless (its return value is compared to return value of unlink() which is pointless), so even more optimal version is perl -e 'for(<*>){unlink}' – Andrey May 10 '21 at 04:30
  • @Andrey re your suggested edit, for some people at least combining stat and unlink results in faster deletion; have you had a chance to benchmark the two approaches? – Stephen Kitt May 10 '21 at 06:59
  • @StephenKitt, no, I haven't had a chance to benchmark it yet, but I'll give it a try. If I read it correctly, Kent in his post found only one case when using stat+unlink is faster than solely unlink. – Andrey May 11 '21 at 16:54
  • Woooohooo the perl line deleted ~500K files in 15 seconds!! That one goes directly to my toolbox – Rodrigo Aug 11 '21 at 21:49
  • 4
    On ubuntu 18.04, perl just seems to run and do nothing – Freedo Jan 20 '22 at 05:51
  • 8
    Ubuntu 20.04 that perl command does nothing. Does anybody have a recursive perl variant? And is there any way to get a progress bar for rsync? I tried -P and --info=progress2 but no progress bar. – dreamflasher Feb 21 '22 at 16:31
  • rsync method also helped me with Value too large for defined data type error – Sergey Jul 11 '22 at 13:00
54

Someone on Twitter suggested using -delete instead of -exec rm -f{} \;

This has improved the efficiency of the command, it still uses recursion to go through everything though.

Stephen Kitt
  • 434,908
Toby
  • 3,993
36

A clever trick:

rsync -a --delete empty/ your_folder/

It's super CPU intensive, but really really fast. See https://web.archive.org/web/20130929001850/http://linuxnote.net/jianingy/en/linux/a-fast-way-to-remove-huge-number-of-files.html

Qtax
  • 105
MZAweb
  • 553
  • It's not so fast, because it reads the directory contents in-efficiently. See this answer for 10x faster solution and explanation http://serverfault.com/a/328305/105902 – Marki555 Jun 29 '15 at 12:46
  • 2
    @Marki555: in the Edit of the question it is reported 60 seconds for rsync -a --delete vs 43 for lsdent. The ratio 10x was for time ls -1 | wc -l vs time ./dentls bigfolder >out.txt (that is a partially fair comparison because of > file vs wc -l). – Hastur Jan 21 '16 at 09:30
  • The problem there is that NONE of the commands over there actually DO the desired traversal operation for deletion. The code they give? DOES NOT WORK as described by Marki555. – Svartalf Sep 10 '18 at 16:05
22

What about something like: find /path/to/folder -name "filenamestart*" -type f -print0 | xargs -0rn 20 rm -f

You can limit number of files to delete at once by changing the argument for parameter -n. The file names with blanks are included also.

  • 2
    You probably don't need the -n 20 bit, since xargs should limit itself to acceptable argument-list sizes anyway. – Useless Apr 26 '12 at 13:41
  • Yes, you are right. Here is a note from man xargs : (...) max-chars characters per command line (...). The largest allowed value is system-dependent, and is calculated as the argument length limit for exec. So -n option is for such cases where xargs cannot determine the CLI buffer size or if the executed command has some limits. – digital_infinity Apr 26 '12 at 13:50
16

Expanding on one of the comments, I do not think you're doing what you think you're doing.

First I created a huge amount of files, to simulate your situation:

$ mkdir foo
$ cd foo/
$ for X in $(seq 1 1000);do touch {1..1000}_$X; done

Then I tried what I expected to fail, and what it sounds like you're doing in the question:

$ rm -r foo/*
bash: /bin/rm: Argument list too long

But this does work:

$ rm -r foo/
$ ls foo
ls: cannot access foo: No such file or directory
Izkata
  • 709
  • 8
    This is the only solution that worked: Run rm -Rf bigdirectory several times. I had a directory with thousands of millions of subdirectories and files. I couldn’t even run ls or find or rsync in that directory, because it ran out of memory. The command rm -Rf quit many times (out of memory) only deleting part of the billions of files. But after many retries it finally did the job. Seems to be the only solution if running out of memory is the problem. – erik Apr 09 '14 at 13:01
14

Use rm -rf directory instead of rm -rf *.

We were initially doing rm -rf * while in the directory to clear the contents and thought that was as fast as it could get. But then one of our senior engineers suggested we avoid using the asterisks (*) and instead pass in the parent directory, like rm -rf directory.

After some heavy debate about how that wouldn't make a difference, we decided to benchmark it, along with a third method of using find. Here are the results:

time rm -rf *                   2m17.32s
time rm -rf directory           0m15.60s
time find directory -delete     0m16.97s

rm -rf directory is about 9 TIMES FASTER than rm -rf *!

Needless to say, we bought that engineer a beer!

So now we use rm -rf directory; mkdir directory to delete the directory and re-create it.

  • 2
    The problem is that * does a shell expansion, which means: (a) it reads the entire directory, and then (b) sorts all the filenames, even before the find is invoked. Using ls -1 -U reads the directory in serial order. You can head -n 10000 and get a list to send to xargs rm. And because those names are all serial in the first part of the directory, they get deleted efficiently too. Just put that in a loop until no files are left, and it works pretty well. – Paul_Pedant Nov 15 '19 at 21:05
  • Thanks for the reasoning @Paul_Pedant! – Joshua Pinter Nov 15 '19 at 21:21
  • Not the fastest option: https://yonglhuang.com/rm-file/ – Philippe Remy Dec 22 '21 at 01:57
12

I had the opportunity to test -delete as compared to -exec rm \{\} \; and for me -delete was the answer to this problem.

Using -delete deleted the files in a folder of 400,000 files at least 1,000 times faster than rm.

The 'How to delete large number of files in linux' article suggests it is about three time faster, but in my test the difference was much more dramatic.

slm
  • 369,824
  • 5
    Using find -exec executes the rm command for every file separately, that's why it is so slow. – Marki555 Jun 26 '15 at 21:43
  • 1
    With GNU find, this is where -exec rm {} \+ comes in handy (specifically the \+ in place of \;), as it works like a built-in xargs without the minimal pipe and fork overhead. Still slower than other options, though. – dannysauer Dec 02 '19 at 23:12
  • @dannysauer execplus has been invented in 1988 by David Korn at AT&T and GNU find was the last find implementation to add support - more than 25 years later. BTW: the speed difference between the standard execplus and the nonstandard -delete is minimal. – schily Feb 22 '20 at 10:09
  • @schily, that's interesting, and I'm a huge fan of Korn's work. However, the answer we're commenting on suggests that testing was happening on Linux. "GNU find" was specified to distinguish from other possible minimal Linux implementations, like busybox. :) – dannysauer Feb 23 '20 at 04:12
8

About the -delete option above: I'm using it to remove a large number (1M+ est) files in a temp folder that I created and inadvertently forgot to cleanup nightly. I filled my disk/partition accidentally, and nothing else could remove them but the find . command. It is slow, at first I was using:

find . -ls -exec rm {} \;

But that was taking an EXTREME amount of time. It started after about 15 mins to remove some of the files, but my guess is that it was removing less than 10 or so per second after it finally started. So, I tried the:

find . -delete

instead, and I'm letting it run right now. It appears to be running faster, though it's EXTREMELY taxing on the CPU which the other command was not. It's been running for like an hour now and I think I'm getting space back on my drive and the partition gradually "slimming down" but it's still taking a very long time. I seriously doubt it's running 1,000 times faster than the other. As in all things, I just wanted to point out the tradeoff in space vs. time. If you have the CPU bandwidth to spare (we do) then run the latter. It's got my CPU running (uptime reports):

10:59:17 up 539 days, 21:21,  3 users,  load average: 22.98, 24.10, 22.87

And I've seen the load average go over 30.00 which is not good for a busy system, but for ours which is normally lightly loaded, it's OK for a couple hours. I've checked most other things on the system and they're still responsive so we are OK for now.

Scotty
  • 81
  • 3
    if you're going to use exec you almost certainly want not use -ls and do find . -type f -exec rm '{}' + + is faster because it will give as many arguments to rm as it can handle at once. – xenoterracide Jan 03 '14 at 17:48
  • I think you should go ahead and edit this into its own answer… it's really too long for a comment. Also, it sound like your filesystem has fairly expensive deletes, curious which one it is? You can run that find … -delete through nice or ionice, that may help. So might changing some mount options to less-crash-safe settings. (And, of course, depending on what else is on the filesystem, the quickest way to delete everything is often mkfs.) – derobert Jan 04 '14 at 07:24
  • 3
    Load average is not always CPU, it's just a measure of the number of blocked processes over time. Processes can block on disk I/O, which is likely what is happening here. – Score_Under Jul 14 '14 at 12:47
  • Also note that load average does not account for number of logical CPUs. So loadavg 1 for single-core machine is the same as loadavg 64 on 64-core system - meaning each CPU is busy 100% of time. – Marki555 Jun 29 '15 at 12:49
6

There are couple of methods that can be used to delete large number of files in linux,. You can use find with delete option, which is faster than exec option. Then you can use perl unlink, then even rsync. How to delete large number of files in linux

sarath
  • 69
5

Consider using Btrfs volume and simply delete whole volume for such a directory with large number of files.

Alternatively you can create an FS image file then unmount and delete its file to remove everything at once really fast.

4

Assuming to have GNU parallel installed, I've used this:

parallel rm -rf dir/{} ::: `ls -f dir/`

and it was fast enough.

Nacho
  • 171
3

If you have millions of files and every solution above gets your system in stress you may try this inspiration:

File nice_delete:

#!/bin/bash

MAX_LOAD=3
FILES=("$@")
BATCH=100

while [ ${#FILES[@]} -gt 0 ]; do
    DEL=("${FILES[@]:0:$BATCH}")
    ionice -c3 rm "${DEL[@]}"
    echo -n "#"
    FILES=("${FILES[@]:$BATCH}")
    while [[ $(cat /proc/loadavg | awk '{print int($1)}') -gt $MAX_LOAD ]]; do
        echo -n "."
        sleep 1
    done
done

And now delete the files:

find /path/to/folder -type f -exec ./nice_delete {} \+

Find will create batches (see getconf ARG_MAX) of some tens thousands of files and pass it to nice_delete. This will create even smaller batches to allow sleeping when overload is detected.

brablc
  • 241
3

This is not applicable to most cases, but a trivial and instant way of fast deletion is renaming the directory and deleting in the background.

frm () {
    now=$(date "+%F-%T-%Z")
    for file in "$@"
    do
        new_name="${file}_${now}"
        mv -i -- "$file" "$new_name"
        nohup rm -rf -- "$new_name" > "/tmp/$new_name.log" 2>&1 &
    done
}

In my case, I often need to recreate node_modules or a repository entirely, and deleting would take forever. I just rename node_modules to deleting_node_modules and run the rm command in the background.

I get to continue with my work right away.

Kusalananda
  • 333,661
Zack Light
  • 149
  • 3
  • 1
    (1) This answer was given 9 years ago. (2) This doesn’t address the question of *how* to delete the deleting_node_modules directory *at all. The question (which I’ll admit was unclear) indicated that common methods of deleting the directory were not working at all.* – G-Man Says 'Reinstate Monica' Sep 16 '22 at 23:52
  • 1
    Note that with your proposed function, redirecting to the log file would be problematic if the function was given relative or absolute pathnames (i.e. if $file, and subsequently $new_name, contained slashes). – Kusalananda Mar 29 '23 at 04:11
  • 1
    dude, this saved me hours of having to wait for a folder to be deleted before I could run a deployment. Genius solution! – Bogdan Ionitza Aug 04 '23 at 17:31
2

The fastest way to delete all files and folders recursively that I was able to come up with is (it's faster than rsync and faster than everything posted here):

perl -le 'use File::Find; find(sub{unlink if -f}, ".")' && rm -rf *

2

Deleting REALLY LARGE directories needs a different approach, as I learned from this site - you'll need to utilize ionice.It ensures (with -c3) that deletes will only be performed when the system has IO-time for it. You systems load will not rise to high and everything stays responsive (though my CPU time for find was quite high at about 50%).

find <dir> -type f -exec ionice -c3 rm {} \;
gamma
  • 121
1

Python scripts should not be shunned as unclean:

#!/usr/bin/python3

import shutil
path_for_deletion = input( 'path of dir for deletion> ' ) 
print( 'about to remove ' + path_for_deletion + ' ...' )
shutil.rmtree( path_for_deletion, ignore_errors=True )
print( '... done' )

I've asked the guy who has done some useful benchmarking of various methods here if he could try benchmarking this. From my experiments it seems pretty good.

NB errors could be handled to at least print them out... but it might be simpler to run trash myDirectoryForDeletion or rm -rfv myDirectoryForDeletion afterwards.

mike rodent
  • 1,132
1

I've created a multi-threaded replacement for rm with the sole purpose of being the fastest way to delete files, period. In my benchmarking, the worst it performs is 20% faster than anything else and tends to be 2-3 times faster than rm.

The tool: https://github.com/SUPERCILEX/fuc/tree/master/rmz
Benchmarks: https://github.com/SUPERCILEX/fuc/tree/master/comparisons#remove

SUPERCILEX
  • 121
  • 5
0

Use ls -f | xargs -n 5000 rm, while adjusting the -n for batch size as appropriate to your system (kudos to @digital_infinity for -n tip).

Additionally you can filter the listing with an inline grep, e.g. ls -f | grep '^156' | xargs -n 5000 rm.

In my experience this is far faster than techniques using find and obviates the need for more complex shell scripts.

0

interactions directly with the filesystem is what I love about gnu/linux.

sudo debugfs -w /dev/mapper/home -R 'unlink .cache/netbeans/15'
muru
  • 72,889
  • This assumes ext{2,3,4}, it won't work for ZFS, XFS, btrfs, NTFS, exFAT, FAT32, ... – muru Aug 30 '23 at 02:24
0

For Izkata's hint above:

But this does work:

$ rm -r foo/
$ ls foo
ls: cannot access foo: No such file or directory

This almost worked - or would have worked - but I had some problems in permission; files were on a server, but still I don't understand where this permission issue came from. Anyway, Terminal asked for confirmation on every file. Amount of files was around 20 000, so this wasn't an option. After "-r" I added option "-f", so the whole command was "rm -r -f foldername/". Then it seemed to work fine. I'm a novice with Terminal, but I guess this was okay, right? Thanks!

-1
ls -1 | xargs rm -rf 

should work inside the main folder

  • 1
    ls won't work because of the amount of files in the folder. This is why I had to use find, thanks though. – Toby Apr 26 '12 at 08:19
  • 5
    @Toby: Try ls -f, which disables sorting. Sorting requires that the entire directory be loaded into memory to be sorted. An unsorted ls should be able to stream its output. – camh Apr 26 '12 at 10:59
  • 2
    Does not work on filenames that contain newlines. – maxschlepzig Jan 05 '14 at 07:53
  • @camh that's true. But removing files in sorted order is faster than in unsorted (because of recalculating the btree of the directory after each deletion). See this answer for an example http://serverfault.com/a/328305/105902 – Marki555 Jun 29 '15 at 12:50
  • @maxschlepzig for such files you can use find . -print0 | xargs -0 rm, which will use the NULL char as filename separator. – Marki555 Jun 29 '15 at 12:51
-1

If you just want to get rid of many files as soon as possible ls -f1 /path/to/folder/with/many/files/ | xargs rm might work okay, but better don't run it on production systems because your system might become IO issues and applications might get stuck during the delete operation.

This script works nicely for many files and should not affect the ioload of the system.

#!/bin/bash

# Path to folder with many files
FOLDER="/path/to/folder/with/many/files"

# Temporary file to store file names
FILE_FILENAMES="/tmp/filenames"

if [ -z "$FOLDER" ]; then
    echo "Prevented you from deleting everything! Correct your FOLDER variable!"
    exit 1
fi

while true; do
    FILES=$(ls -f1 $FOLDER | wc -l)
    if [ "$FILES" -gt 10000 ]; then
        printf "[%s] %s files found. going on with removing\n" "$(date)" "$FILES"
        # Create new list of files
        ls -f1 $FOLDER | head -n 5002 | tail -n 5000 > "$FILE_FILENAMES"

        if [ -s $FILE_FILENAMES ]; then
            while read FILE; do
                rm "$FOLDER/$FILE"
                sleep 0.005
            done < "$FILE_FILENAMES"
        fi
    else
        printf "[%s] script has finished, almost all files have been deleted" "$(date)"
        break
    fi
    sleep 5
done
-1

Use ncdu and the d option. For me, it worked better that the above options and you can see how fast you're releasing space.

JulesR
  • 19
-1

Depending on how well you need to get rid of those files, I'd suggest using shred.

$ shred -zuv folder

if you want to purge the directory, but you can't remove it and recreate it, I suggest moving it and recreating it instantly.

mv folder folder_del
mkdir folder
rm -rf folder_del

this is faster, believe it or not, as only one inode has to be changed. Remember: You can't really parallelize this tast on a multicore computer. It comes down to disk access, which is limited by the RAID or what have you.

polemon
  • 11,431